Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:EvolvingLMMs Lab Lmms eval Evaluate Endpoint

From Leeroopedia
Knowledge Sources
Domains Server, Evaluation
Last Updated 2026-02-14 00:00 GMT

Overview

Concrete tool for submitting evaluation jobs via REST API with request validation and queue placement provided by the lmms-eval framework.

Description

The POST /evaluate endpoint accepts an EvaluateRequest JSON body, validates it through Pydantic schema enforcement, and delegates to JobScheduler.add_job() to create a new job entry and enqueue it. The endpoint returns a JobSubmitResponse containing the generated job ID, initial status of "queued", the position in the queue, and a human-readable confirmation message.

The EvaluateRequest model defines two required fields (model and tasks) and several optional fields for controlling evaluation behavior. The JobScheduler.add_job() method generates a UUID4 job identifier, creates a JobInfo record under an async lock, and places the job ID onto the internal asyncio.Queue for background processing.

Usage

Use this implementation when you need to:

  • Submit a new evaluation job to the running lmms-eval server
  • Queue multiple evaluations for sequential processing
  • Obtain a job ID for subsequent status polling or cancellation

Code Reference

Source Location

  • Repository: lmms-eval
  • File: lmms_eval/entrypoints/http_server.py
  • Lines: L97-113
  • File: lmms_eval/entrypoints/protocol.py
  • Lines: L24-38 (EvaluateRequest), L55-61 (JobSubmitResponse)
  • File: lmms_eval/entrypoints/job_scheduler.py
  • Lines: L119-143 (add_job)

Signature

# HTTP endpoint
@app.post("/evaluate", response_model=JobSubmitResponse)
async def submit_evaluation(request: Request, eval_request: EvaluateRequest):
    """Submit an evaluation job to the queue."""

# EvaluateRequest model
class EvaluateRequest(BaseModel):
    model: str = Field(..., description="Model name or path")
    tasks: List[str] = Field(..., description="List of task names to evaluate")
    model_args: Optional[Dict[str, Any]] = Field(default=None)
    num_fewshot: Optional[int] = Field(default=None)
    batch_size: Optional[Union[int, str]] = Field(default=None)
    device: Optional[str] = Field(default=None)
    limit: Optional[Union[int, float]] = Field(default=None)
    gen_kwargs: Optional[str] = Field(default=None)
    log_samples: bool = Field(default=True)
    predict_only: bool = Field(default=False)
    num_gpus: int = Field(default=1)
    output_dir: Optional[str] = Field(default=None)

# Scheduler method
async def add_job(self, request: EvaluateRequest) -> tuple[str, int]:
    """Create and queue a new job. Returns (job_id, position_in_queue)."""

Import

from lmms_eval.entrypoints.protocol import EvaluateRequest, JobSubmitResponse
from lmms_eval.entrypoints.job_scheduler import JobScheduler

I/O Contract

Inputs

Name Type Required Description
model str Yes Model name or path (e.g., "qwen2_5_vl", "llava")
tasks List[str] Yes List of evaluation task names (e.g., ["mmmu_val", "mme"])
model_args Optional[Dict[str, Any]] No Model-specific arguments such as pretrained path, max_pixels, attention implementation
num_fewshot Optional[int] No Number of few-shot examples to provide
batch_size Optional[Union[int, str]] No Batch size for evaluation; can be an integer or "auto"
device Optional[str] No Device to run evaluation on (e.g., "cuda:0")
limit Optional[Union[int, float]] No Limit number of evaluation examples (for testing)
gen_kwargs Optional[str] No Generation keyword arguments as a string
log_samples bool No (default: True) Whether to log individual sample predictions
predict_only bool No (default: False) Only generate predictions, skip metric computation
num_gpus int No (default: 1) Number of GPUs to use for the evaluation
output_dir Optional[str] No Custom output directory for results; defaults to a temporary directory

Outputs

Name Type Description
job_id str UUID4 identifier for the submitted job
status JobStatus Always "queued" on successful submission
position_in_queue int Zero-indexed position in the processing queue
message str Human-readable confirmation message

Usage Examples

Basic Example

import httpx

response = httpx.post(
    "http://localhost:8000/evaluate",
    json={
        "model": "qwen2_5_vl",
        "tasks": ["mmmu_val"],
        "model_args": {
            "pretrained": "Qwen/Qwen2.5-VL-3B-Instruct",
            "max_pixels": 12845056,
        },
        "batch_size": 128,
        "num_gpus": 1,
    },
)
result = response.json()
print(f"Job ID: {result['job_id']}, Position: {result['position_in_queue']}")

Minimal Submission Example

import httpx

response = httpx.post(
    "http://localhost:8000/evaluate",
    json={
        "model": "llava",
        "tasks": ["mme"],
    },
)
job_id = response.json()["job_id"]

Using curl

curl -X POST http://localhost:8000/evaluate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2_5_vl",
    "tasks": ["mmmu_val", "mme"],
    "model_args": {"pretrained": "Qwen/Qwen2.5-VL-3B-Instruct"},
    "batch_size": 64,
    "limit": 100
  }'

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment