Implementation:EvolvingLMMs Lab Lmms eval Evaluate Endpoint

Knowledge Sources	lmms-eval
Domains	Server, Evaluation
Last Updated	2026-02-14 00:00 GMT

Overview

Concrete tool for submitting evaluation jobs via REST API with request validation and queue placement provided by the lmms-eval framework.

Description

The POST /evaluate endpoint accepts an EvaluateRequest JSON body, validates it through Pydantic schema enforcement, and delegates to JobScheduler.add_job() to create a new job entry and enqueue it. The endpoint returns a JobSubmitResponse containing the generated job ID, initial status of "queued", the position in the queue, and a human-readable confirmation message.

The EvaluateRequest model defines two required fields (model and tasks) and several optional fields for controlling evaluation behavior. The JobScheduler.add_job() method generates a UUID4 job identifier, creates a JobInfo record under an async lock, and places the job ID onto the internal asyncio.Queue for background processing.

Usage

Use this implementation when you need to:

Submit a new evaluation job to the running lmms-eval server
Queue multiple evaluations for sequential processing
Obtain a job ID for subsequent status polling or cancellation

Code Reference

Source Location

Repository: lmms-eval
File: lmms_eval/entrypoints/http_server.py
Lines: L97-113
File: lmms_eval/entrypoints/protocol.py
Lines: L24-38 (EvaluateRequest), L55-61 (JobSubmitResponse)
File: lmms_eval/entrypoints/job_scheduler.py
Lines: L119-143 (add_job)

Signature

# HTTP endpoint
@app.post("/evaluate", response_model=JobSubmitResponse)
async def submit_evaluation(request: Request, eval_request: EvaluateRequest):
    """Submit an evaluation job to the queue."""

# EvaluateRequest model
class EvaluateRequest(BaseModel):
    model: str = Field(..., description="Model name or path")
    tasks: List[str] = Field(..., description="List of task names to evaluate")
    model_args: Optional[Dict[str, Any]] = Field(default=None)
    num_fewshot: Optional[int] = Field(default=None)
    batch_size: Optional[Union[int, str]] = Field(default=None)
    device: Optional[str] = Field(default=None)
    limit: Optional[Union[int, float]] = Field(default=None)
    gen_kwargs: Optional[str] = Field(default=None)
    log_samples: bool = Field(default=True)
    predict_only: bool = Field(default=False)
    num_gpus: int = Field(default=1)
    output_dir: Optional[str] = Field(default=None)

# Scheduler method
async def add_job(self, request: EvaluateRequest) -> tuple[str, int]:
    """Create and queue a new job. Returns (job_id, position_in_queue)."""

Import

from lmms_eval.entrypoints.protocol import EvaluateRequest, JobSubmitResponse
from lmms_eval.entrypoints.job_scheduler import JobScheduler

I/O Contract

Inputs

Name	Type	Required	Description
model	`str`	Yes	Model name or path (e.g., "qwen2_5_vl", "llava")
tasks	`List[str]`	Yes	List of evaluation task names (e.g., ["mmmu_val", "mme"])
model_args	`Optional[Dict[str, Any]]`	No	Model-specific arguments such as pretrained path, max_pixels, attention implementation
num_fewshot	`Optional[int]`	No	Number of few-shot examples to provide
batch_size	`Optional[Union[int, str]]`	No	Batch size for evaluation; can be an integer or "auto"
device	`Optional[str]`	No	Device to run evaluation on (e.g., "cuda:0")
limit	`Optional[Union[int, float]]`	No	Limit number of evaluation examples (for testing)
gen_kwargs	`Optional[str]`	No	Generation keyword arguments as a string
log_samples	`bool`	No (default: True)	Whether to log individual sample predictions
predict_only	`bool`	No (default: False)	Only generate predictions, skip metric computation
num_gpus	`int`	No (default: 1)	Number of GPUs to use for the evaluation
output_dir	`Optional[str]`	No	Custom output directory for results; defaults to a temporary directory

Outputs

Name	Type	Description
job_id	`str`	UUID4 identifier for the submitted job
status	`JobStatus`	Always "queued" on successful submission
position_in_queue	`int`	Zero-indexed position in the processing queue
message	`str`	Human-readable confirmation message

Usage Examples

Basic Example

import httpx

response = httpx.post(
    "http://localhost:8000/evaluate",
    json={
        "model": "qwen2_5_vl",
        "tasks": ["mmmu_val"],
        "model_args": {
            "pretrained": "Qwen/Qwen2.5-VL-3B-Instruct",
            "max_pixels": 12845056,
        },
        "batch_size": 128,
        "num_gpus": 1,
    },
)
result = response.json()
print(f"Job ID: {result['job_id']}, Position: {result['position_in_queue']}")

Minimal Submission Example

import httpx

response = httpx.post(
    "http://localhost:8000/evaluate",
    json={
        "model": "llava",
        "tasks": ["mme"],
    },
)
job_id = response.json()["job_id"]

Using curl

curl -X POST http://localhost:8000/evaluate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2_5_vl",
    "tasks": ["mmmu_val", "mme"],
    "model_args": {"pretrained": "Qwen/Qwen2.5-VL-3B-Instruct"},
    "batch_size": 64,
    "limit": 100
  }'

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment