Implementation:EvolvingLMMs Lab Lmms eval LLM Judge Factory

Overview

This implementation provides a factory class for creating judge instances based on configuration. It implements the Factory design pattern to abstract provider instantiation, allowing easy switching between different API backends (OpenAI, Azure OpenAI, async variants, dummy implementations) through configuration rather than code changes.

File Location

/tmp/kapso_repo_sslb_59s/lmms_eval/llm_judge/factory.py (54 lines)

Related Principle

LLM as Judge

Dependencies

os: Environment variable access
typing: Type hints
LLM Judge Base: ServerInterface base class
LLM Judge Protocol: ServerConfig
Provider implementations:
- OpenAIProvider: OpenAI API implementation
- AzureOpenAIProvider: Azure OpenAI API implementation
- AsyncOpenAIProvider: Async OpenAI implementation
- AsyncAzureOpenAIProvider: Async Azure OpenAI implementation
- DummyProvider: Testing/mock implementation

Core Components

ProviderFactory

Factory class for creating judge instances based on API type.

Class Attributes

_provider_classes

_provider_classes = {
    "openai": OpenAIProvider,
    "azure": AzureOpenAIProvider,
    "async_openai": AsyncOpenAIProvider,
    "async_azure": AsyncAzureOpenAIProvider,
    "dummy": DummyProvider
}

Dictionary mapping API type strings to provider class implementations. This registry pattern enables:

Easy lookup of provider classes
Dynamic provider registration
Clear enumeration of supported providers

Methods

register_additional_providers

@classmethod
def register_additional_providers(cls):
    """Register additional providers if available"""
    pass

Placeholder method for registering additional providers dynamically. Currently not implemented.

Design Note: The TODO comment indicates this should be a decorator-based registration system in the future.

Future Pattern:

@ProviderFactory.register("custom_provider")
class CustomProvider(ServerInterface):
    ...

create_provider

@classmethod
def create_provider(
    cls,
    api_type: Optional[str] = None,
    config: Optional[ServerConfig] = None
) -> ServerInterface

Create a judge instance based on API type.

Parameters:

api_type (Optional[str]): Type of API to use. Supported values:
- "openai": OpenAI API (synchronous)
- "azure": Azure OpenAI API (synchronous)
- "async_openai": OpenAI API (asynchronous)
- "async_azure": Azure OpenAI API (asynchronous)
- "dummy": Mock implementation for testing
- If None, reads from API_TYPE environment variable (defaults to "openai")
config (Optional[ServerConfig]): Configuration for the judge instance (model name, temperature, retry settings, etc.)

Returns:

ServerInterface: An instance of the appropriate judge implementation

Raises:

ValueError: If api_type is not in the supported providers list

Process:

If api_type is None, reads from environment variable API_TYPE (default: "openai")
Converts api_type to lowercase for case-insensitive matching
Validates api_type against _provider_classes registry
Retrieves appropriate provider class from registry
Instantiates and returns provider with given config

Example Usage:

# Explicit provider type
config = ServerConfig(model_name="gpt-4", temperature=0.0)
judge = ProviderFactory.create_provider("openai", config)

# From environment variable
import os
os.environ["API_TYPE"] = "azure"
judge = ProviderFactory.create_provider(config=config)

# Async variant
async_judge = ProviderFactory.create_provider("async_openai", config)

register_provider

@classmethod
def register_provider(cls, api_type: str, judge_class: type):
    """Register a new judge implementation"""
    if not issubclass(judge_class, ServerInterface):
        raise ValueError(f"{judge_class} must be a subclass of ServerInterface")
    cls._provider_classes[api_type] = judge_class

Dynamically register a new judge implementation at runtime.

Parameters:

api_type (str): Identifier for the new provider (e.g., "anthropic", "cohere")
judge_class (type): Class implementing ServerInterface

Raises:

ValueError: If judge_class is not a subclass of ServerInterface

Validation:

Ensures type safety by validating inheritance from ServerInterface
Prevents registration of incompatible classes

Example:

class AnthropicProvider(ServerInterface):
    def evaluate(self, request):
        # Implementation
        ...

    def is_available(self):
        return True

# Register the provider
ProviderFactory.register_provider("anthropic", AnthropicProvider)

# Now it can be used
judge = ProviderFactory.create_provider("anthropic", config)

Design Patterns

Factory Pattern

Encapsulates object creation logic, allowing clients to request objects by type without knowing concrete class details.

Benefits:

Decouples client code from concrete implementations
Centralizes provider selection logic
Simplifies testing by allowing mock provider injection
Enables configuration-driven provider selection

Registry Pattern

_provider_classes dictionary acts as a registry of available providers, supporting dynamic lookup and registration.

Benefits:

Easy to extend with new providers
Clear enumeration of supported types
Runtime provider discovery

Environment-based Configuration

Defaults to environment variable (API_TYPE) when api_type not specified, following 12-factor app principles.

Benefits:

Environment-specific configuration without code changes
Easier deployment across different environments
Configuration through environment variables or explicit parameters

Provider Types

Synchronous Providers

openai: Standard OpenAI API client (blocking I/O)
azure: Azure OpenAI API client (blocking I/O)
dummy: Mock implementation for testing

Use Cases:

Simple scripts with sequential evaluation
Low-volume evaluation tasks
Synchronous application contexts

Asynchronous Providers

async_openai: Async OpenAI API client (non-blocking I/O)
async_azure: Async Azure OpenAI API client (non-blocking I/O)

Use Cases:

High-throughput batch evaluation
Concurrent request processing
Integration with async frameworks (aiohttp, FastAPI)

Testing Provider

dummy: Returns mock responses without API calls

Use Cases:

Unit testing evaluation pipelines
Development without API access
CI/CD environments

Usage Patterns

Basic Usage

from lmms_eval.llm_judge.factory import ProviderFactory
from lmms_eval.llm_judge.protocol import ServerConfig

config = ServerConfig(
    model_name="gpt-4",
    temperature=0.0,
    max_tokens=512,
    timeout=30
)

judge = ProviderFactory.create_provider("openai", config)
result = judge.evaluate_binary(
    question="What is 2+2?",
    answer="4",
    prediction="Four"
)

Environment-Based Configuration

import os

# Set via environment
os.environ["API_TYPE"] = "azure"

# Factory reads from environment
judge = ProviderFactory.create_provider(config=config)

Async Batch Processing

import asyncio

config = ServerConfig(
    model_name="gpt-4-turbo",
    max_concurrent=10  # Limit concurrent requests
)

async_judge = ProviderFactory.create_provider("async_openai", config)

async def batch_evaluate():
    results = await async_judge.evaluate_binary_batch_async(
        questions=["Q1", "Q2", "Q3"],
        answers=["A1", "A2", "A3"],
        predictions=["P1", "P2", "P3"]
    )
    return results

results = asyncio.run(batch_evaluate())

Custom Provider Registration

from lmms_eval.llm_judge.base import ServerInterface
from lmms_eval.llm_judge.protocol import Request, Response

class CustomLLMProvider(ServerInterface):
    def evaluate(self, request: Request) -> Response:
        # Custom implementation
        return Response(
            content="Custom evaluation result",
            model_used=self.config.model_name,
            success=True
        )

    def is_available(self) -> bool:
        return True

# Register
ProviderFactory.register_provider("custom_llm", CustomLLMProvider)

# Use
judge = ProviderFactory.create_provider("custom_llm", config)

Testing with Dummy Provider

# Use dummy provider for testing
test_judge = ProviderFactory.create_provider("dummy")

# Returns mock responses without API calls
result = test_judge.evaluate_binary(
    question="Test question",
    answer="Test answer",
    prediction="Test prediction"
)

Configuration Options

The factory accepts ServerConfig with the following key parameters:

model_name: Model identifier (e.g., "gpt-4", "gpt-3.5-turbo")
temperature: Sampling temperature (default: 0.0 for deterministic evaluation)
max_tokens: Maximum response length (default: 1024)
timeout: API request timeout in seconds (default: 60)
num_retries: Number of retry attempts on failure (default: 5)
retry_delay: Delay between retries in seconds (default: 10)
max_concurrent: Maximum concurrent requests for async providers (default: 10)

Error Handling

Unknown API Type:

try:
    judge = ProviderFactory.create_provider("unknown_type")
except ValueError as e:
    print(f"Error: {e}")
    # Error: Unknown API type: unknown_type.
    # Supported types: ['openai', 'azure', 'async_openai', 'async_azure', 'dummy']

Invalid Provider Class:

class NotAJudge:
    pass

try:
    ProviderFactory.register_provider("bad", NotAJudge)
except ValueError as e:
    print(f"Error: {e}")
    # Error: <class 'NotAJudge'> must be a subclass of ServerInterface

Extension Guidelines

Adding a New Provider

Implement ServerInterface or AsyncServerInterface
Register via register_provider()
Document provider-specific configuration requirements
Add to _provider_classes if part of core distribution

Example:

from lmms_eval.llm_judge.base import AsyncServerInterface

class HuggingFaceProvider(AsyncServerInterface):
    async def evaluate_async(self, request: Request) -> Response:
        # HuggingFace API implementation
        ...

    def is_available(self) -> bool:
        # Check HF API availability
        ...

ProviderFactory.register_provider("huggingface", HuggingFaceProvider)

Related Implementations

LLM Judge Base: Base classes that providers must implement
LLM Judge Protocol: ServerConfig and other protocol types
Provider implementations in lmms_eval/llm_judge/providers/

Best Practices

Use environment variables for production deployments to avoid hardcoding API types
Choose async providers for batch evaluation (>10 requests)
Use dummy provider for unit tests to avoid API costs
Set appropriate max_concurrent limits to respect API rate limits
Register custom providers rather than modifying factory code directly
Validate configurations before passing to factory

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment