Implementation:EvolvingLMMs Lab Lmms eval Evaluate Endpoint
| Knowledge Sources | |
|---|---|
| Domains | Server, Evaluation |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Concrete tool for submitting evaluation jobs via REST API with request validation and queue placement provided by the lmms-eval framework.
Description
The POST /evaluate endpoint accepts an EvaluateRequest JSON body, validates it through Pydantic schema enforcement, and delegates to JobScheduler.add_job() to create a new job entry and enqueue it. The endpoint returns a JobSubmitResponse containing the generated job ID, initial status of "queued", the position in the queue, and a human-readable confirmation message.
The EvaluateRequest model defines two required fields (model and tasks) and several optional fields for controlling evaluation behavior. The JobScheduler.add_job() method generates a UUID4 job identifier, creates a JobInfo record under an async lock, and places the job ID onto the internal asyncio.Queue for background processing.
Usage
Use this implementation when you need to:
- Submit a new evaluation job to the running lmms-eval server
- Queue multiple evaluations for sequential processing
- Obtain a job ID for subsequent status polling or cancellation
Code Reference
Source Location
- Repository: lmms-eval
- File:
lmms_eval/entrypoints/http_server.py - Lines: L97-113
- File:
lmms_eval/entrypoints/protocol.py - Lines: L24-38 (EvaluateRequest), L55-61 (JobSubmitResponse)
- File:
lmms_eval/entrypoints/job_scheduler.py - Lines: L119-143 (add_job)
Signature
# HTTP endpoint
@app.post("/evaluate", response_model=JobSubmitResponse)
async def submit_evaluation(request: Request, eval_request: EvaluateRequest):
"""Submit an evaluation job to the queue."""
# EvaluateRequest model
class EvaluateRequest(BaseModel):
model: str = Field(..., description="Model name or path")
tasks: List[str] = Field(..., description="List of task names to evaluate")
model_args: Optional[Dict[str, Any]] = Field(default=None)
num_fewshot: Optional[int] = Field(default=None)
batch_size: Optional[Union[int, str]] = Field(default=None)
device: Optional[str] = Field(default=None)
limit: Optional[Union[int, float]] = Field(default=None)
gen_kwargs: Optional[str] = Field(default=None)
log_samples: bool = Field(default=True)
predict_only: bool = Field(default=False)
num_gpus: int = Field(default=1)
output_dir: Optional[str] = Field(default=None)
# Scheduler method
async def add_job(self, request: EvaluateRequest) -> tuple[str, int]:
"""Create and queue a new job. Returns (job_id, position_in_queue)."""
Import
from lmms_eval.entrypoints.protocol import EvaluateRequest, JobSubmitResponse
from lmms_eval.entrypoints.job_scheduler import JobScheduler
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | str |
Yes | Model name or path (e.g., "qwen2_5_vl", "llava") |
| tasks | List[str] |
Yes | List of evaluation task names (e.g., ["mmmu_val", "mme"]) |
| model_args | Optional[Dict[str, Any]] |
No | Model-specific arguments such as pretrained path, max_pixels, attention implementation |
| num_fewshot | Optional[int] |
No | Number of few-shot examples to provide |
| batch_size | Optional[Union[int, str]] |
No | Batch size for evaluation; can be an integer or "auto" |
| device | Optional[str] |
No | Device to run evaluation on (e.g., "cuda:0") |
| limit | Optional[Union[int, float]] |
No | Limit number of evaluation examples (for testing) |
| gen_kwargs | Optional[str] |
No | Generation keyword arguments as a string |
| log_samples | bool |
No (default: True) | Whether to log individual sample predictions |
| predict_only | bool |
No (default: False) | Only generate predictions, skip metric computation |
| num_gpus | int |
No (default: 1) | Number of GPUs to use for the evaluation |
| output_dir | Optional[str] |
No | Custom output directory for results; defaults to a temporary directory |
Outputs
| Name | Type | Description |
|---|---|---|
| job_id | str |
UUID4 identifier for the submitted job |
| status | JobStatus |
Always "queued" on successful submission |
| position_in_queue | int |
Zero-indexed position in the processing queue |
| message | str |
Human-readable confirmation message |
Usage Examples
Basic Example
import httpx
response = httpx.post(
"http://localhost:8000/evaluate",
json={
"model": "qwen2_5_vl",
"tasks": ["mmmu_val"],
"model_args": {
"pretrained": "Qwen/Qwen2.5-VL-3B-Instruct",
"max_pixels": 12845056,
},
"batch_size": 128,
"num_gpus": 1,
},
)
result = response.json()
print(f"Job ID: {result['job_id']}, Position: {result['position_in_queue']}")
Minimal Submission Example
import httpx
response = httpx.post(
"http://localhost:8000/evaluate",
json={
"model": "llava",
"tasks": ["mme"],
},
)
job_id = response.json()["job_id"]
Using curl
curl -X POST http://localhost:8000/evaluate \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2_5_vl",
"tasks": ["mmmu_val", "mme"],
"model_args": {"pretrained": "Qwen/Qwen2.5-VL-3B-Instruct"},
"batch_size": 64,
"limit": 100
}'