Implementation:EvolvingLMMs Lab Lmms eval LLM Judge Factory
Overview
This implementation provides a factory class for creating judge instances based on configuration. It implements the Factory design pattern to abstract provider instantiation, allowing easy switching between different API backends (OpenAI, Azure OpenAI, async variants, dummy implementations) through configuration rather than code changes.
File Location
/tmp/kapso_repo_sslb_59s/lmms_eval/llm_judge/factory.py (54 lines)
Related Principle
Dependencies
os: Environment variable accesstyping: Type hints- LLM Judge Base: ServerInterface base class
- LLM Judge Protocol: ServerConfig
- Provider implementations:
- OpenAIProvider: OpenAI API implementation
- AzureOpenAIProvider: Azure OpenAI API implementation
- AsyncOpenAIProvider: Async OpenAI implementation
- AsyncAzureOpenAIProvider: Async Azure OpenAI implementation
- DummyProvider: Testing/mock implementation
Core Components
ProviderFactory
Factory class for creating judge instances based on API type.
Class Attributes
_provider_classes
_provider_classes = {
"openai": OpenAIProvider,
"azure": AzureOpenAIProvider,
"async_openai": AsyncOpenAIProvider,
"async_azure": AsyncAzureOpenAIProvider,
"dummy": DummyProvider
}
Dictionary mapping API type strings to provider class implementations. This registry pattern enables:
- Easy lookup of provider classes
- Dynamic provider registration
- Clear enumeration of supported providers
Methods
register_additional_providers
@classmethod
def register_additional_providers(cls):
"""Register additional providers if available"""
pass
Placeholder method for registering additional providers dynamically. Currently not implemented.
Design Note: The TODO comment indicates this should be a decorator-based registration system in the future.
Future Pattern:
@ProviderFactory.register("custom_provider")
class CustomProvider(ServerInterface):
...
create_provider
@classmethod
def create_provider(
cls,
api_type: Optional[str] = None,
config: Optional[ServerConfig] = None
) -> ServerInterface
Create a judge instance based on API type.
Parameters:
api_type(Optional[str]): Type of API to use. Supported values:"openai": OpenAI API (synchronous)"azure": Azure OpenAI API (synchronous)"async_openai": OpenAI API (asynchronous)"async_azure": Azure OpenAI API (asynchronous)"dummy": Mock implementation for testing- If None, reads from API_TYPE environment variable (defaults to "openai")
config(Optional[ServerConfig]): Configuration for the judge instance (model name, temperature, retry settings, etc.)
Returns:
- ServerInterface: An instance of the appropriate judge implementation
Raises:
- ValueError: If api_type is not in the supported providers list
Process:
- If api_type is None, reads from environment variable API_TYPE (default: "openai")
- Converts api_type to lowercase for case-insensitive matching
- Validates api_type against _provider_classes registry
- Retrieves appropriate provider class from registry
- Instantiates and returns provider with given config
Example Usage:
# Explicit provider type
config = ServerConfig(model_name="gpt-4", temperature=0.0)
judge = ProviderFactory.create_provider("openai", config)
# From environment variable
import os
os.environ["API_TYPE"] = "azure"
judge = ProviderFactory.create_provider(config=config)
# Async variant
async_judge = ProviderFactory.create_provider("async_openai", config)
register_provider
@classmethod
def register_provider(cls, api_type: str, judge_class: type):
"""Register a new judge implementation"""
if not issubclass(judge_class, ServerInterface):
raise ValueError(f"{judge_class} must be a subclass of ServerInterface")
cls._provider_classes[api_type] = judge_class
Dynamically register a new judge implementation at runtime.
Parameters:
api_type(str): Identifier for the new provider (e.g., "anthropic", "cohere")judge_class(type): Class implementing ServerInterface
Raises:
- ValueError: If judge_class is not a subclass of ServerInterface
Validation:
- Ensures type safety by validating inheritance from ServerInterface
- Prevents registration of incompatible classes
Example:
class AnthropicProvider(ServerInterface):
def evaluate(self, request):
# Implementation
...
def is_available(self):
return True
# Register the provider
ProviderFactory.register_provider("anthropic", AnthropicProvider)
# Now it can be used
judge = ProviderFactory.create_provider("anthropic", config)
Design Patterns
Factory Pattern
Encapsulates object creation logic, allowing clients to request objects by type without knowing concrete class details.
Benefits:
- Decouples client code from concrete implementations
- Centralizes provider selection logic
- Simplifies testing by allowing mock provider injection
- Enables configuration-driven provider selection
Registry Pattern
_provider_classes dictionary acts as a registry of available providers, supporting dynamic lookup and registration.
Benefits:
- Easy to extend with new providers
- Clear enumeration of supported types
- Runtime provider discovery
Environment-based Configuration
Defaults to environment variable (API_TYPE) when api_type not specified, following 12-factor app principles.
Benefits:
- Environment-specific configuration without code changes
- Easier deployment across different environments
- Configuration through environment variables or explicit parameters
Provider Types
Synchronous Providers
- openai: Standard OpenAI API client (blocking I/O)
- azure: Azure OpenAI API client (blocking I/O)
- dummy: Mock implementation for testing
Use Cases:
- Simple scripts with sequential evaluation
- Low-volume evaluation tasks
- Synchronous application contexts
Asynchronous Providers
- async_openai: Async OpenAI API client (non-blocking I/O)
- async_azure: Async Azure OpenAI API client (non-blocking I/O)
Use Cases:
- High-throughput batch evaluation
- Concurrent request processing
- Integration with async frameworks (aiohttp, FastAPI)
Testing Provider
- dummy: Returns mock responses without API calls
Use Cases:
- Unit testing evaluation pipelines
- Development without API access
- CI/CD environments
Usage Patterns
Basic Usage
from lmms_eval.llm_judge.factory import ProviderFactory
from lmms_eval.llm_judge.protocol import ServerConfig
config = ServerConfig(
model_name="gpt-4",
temperature=0.0,
max_tokens=512,
timeout=30
)
judge = ProviderFactory.create_provider("openai", config)
result = judge.evaluate_binary(
question="What is 2+2?",
answer="4",
prediction="Four"
)
Environment-Based Configuration
import os
# Set via environment
os.environ["API_TYPE"] = "azure"
# Factory reads from environment
judge = ProviderFactory.create_provider(config=config)
Async Batch Processing
import asyncio
config = ServerConfig(
model_name="gpt-4-turbo",
max_concurrent=10 # Limit concurrent requests
)
async_judge = ProviderFactory.create_provider("async_openai", config)
async def batch_evaluate():
results = await async_judge.evaluate_binary_batch_async(
questions=["Q1", "Q2", "Q3"],
answers=["A1", "A2", "A3"],
predictions=["P1", "P2", "P3"]
)
return results
results = asyncio.run(batch_evaluate())
Custom Provider Registration
from lmms_eval.llm_judge.base import ServerInterface
from lmms_eval.llm_judge.protocol import Request, Response
class CustomLLMProvider(ServerInterface):
def evaluate(self, request: Request) -> Response:
# Custom implementation
return Response(
content="Custom evaluation result",
model_used=self.config.model_name,
success=True
)
def is_available(self) -> bool:
return True
# Register
ProviderFactory.register_provider("custom_llm", CustomLLMProvider)
# Use
judge = ProviderFactory.create_provider("custom_llm", config)
Testing with Dummy Provider
# Use dummy provider for testing
test_judge = ProviderFactory.create_provider("dummy")
# Returns mock responses without API calls
result = test_judge.evaluate_binary(
question="Test question",
answer="Test answer",
prediction="Test prediction"
)
Configuration Options
The factory accepts ServerConfig with the following key parameters:
model_name: Model identifier (e.g., "gpt-4", "gpt-3.5-turbo")temperature: Sampling temperature (default: 0.0 for deterministic evaluation)max_tokens: Maximum response length (default: 1024)timeout: API request timeout in seconds (default: 60)num_retries: Number of retry attempts on failure (default: 5)retry_delay: Delay between retries in seconds (default: 10)max_concurrent: Maximum concurrent requests for async providers (default: 10)
Error Handling
Unknown API Type:
try:
judge = ProviderFactory.create_provider("unknown_type")
except ValueError as e:
print(f"Error: {e}")
# Error: Unknown API type: unknown_type.
# Supported types: ['openai', 'azure', 'async_openai', 'async_azure', 'dummy']
Invalid Provider Class:
class NotAJudge:
pass
try:
ProviderFactory.register_provider("bad", NotAJudge)
except ValueError as e:
print(f"Error: {e}")
# Error: <class 'NotAJudge'> must be a subclass of ServerInterface
Extension Guidelines
Adding a New Provider
- Implement ServerInterface or AsyncServerInterface
- Register via register_provider()
- Document provider-specific configuration requirements
- Add to _provider_classes if part of core distribution
Example:
from lmms_eval.llm_judge.base import AsyncServerInterface
class HuggingFaceProvider(AsyncServerInterface):
async def evaluate_async(self, request: Request) -> Response:
# HuggingFace API implementation
...
def is_available(self) -> bool:
# Check HF API availability
...
ProviderFactory.register_provider("huggingface", HuggingFaceProvider)
Related Implementations
- LLM Judge Base: Base classes that providers must implement
- LLM Judge Protocol: ServerConfig and other protocol types
- Provider implementations in
lmms_eval/llm_judge/providers/
Best Practices
- Use environment variables for production deployments to avoid hardcoding API types
- Choose async providers for batch evaluation (>10 requests)
- Use dummy provider for unit tests to avoid API costs
- Set appropriate max_concurrent limits to respect API rate limits
- Register custom providers rather than modifying factory code directly
- Validate configurations before passing to factory