Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Mlflow Mlflow Predict Function Definition

From Leeroopedia
Knowledge Sources
Domains ML_Ops, LLM_Evaluation
Last Updated 2026-02-13 20:00 GMT

Overview

Wrapping a generative AI application behind a standardised callable interface so that the evaluation harness can invoke it uniformly for each input row.

Description

In many evaluation scenarios the model outputs are not pre-computed. Instead, the evaluation harness must call the application under test for each row of the evaluation dataset and capture both the response and an execution trace. The predict function definition principle establishes the contract that such a callable must satisfy.

The core requirement is simple: the function accepts keyword arguments whose names correspond to the keys inside the inputs dictionary of the evaluation dataset. For example, if each row's inputs dictionary contains {"question": "...", "context": "..."}, the predict function should accept question and context as keyword parameters. The harness unpacks the inputs dictionary and passes each key-value pair as a named argument. The function returns the model's output, which can be any type.

Beyond the basic calling convention, the predict function must produce exactly one execution trace per invocation. Traces capture the full call graph of the application -- LLM calls, tool invocations, retrieval steps -- and are essential for trace-aware scorers. If the function already produces traces (e.g., through auto-tracing integrations with OpenAI, LangChain, or similar frameworks), no additional instrumentation is needed. If it does not, the evaluation harness automatically wraps the function with tracing to ensure trace availability.

The principle also accommodates asynchronous functions. When an async function is provided, the harness wraps it in a synchronous adapter so that the evaluation loop, which processes rows sequentially or in a bounded thread pool, can call it without requiring an external event loop.

Usage

Define a predict function whenever the evaluation dataset contains only inputs (and optionally expectations) but not pre-computed outputs. This is the standard pattern during development iteration: you change the application code, run evaluation with the same dataset, and compare metrics across runs. Omit the predict function when the dataset already includes outputs or trace columns -- for example, when re-scoring historical traces retrieved from the tracking store.

Theoretical Basis

The predict function serves as an adapter between the application's native interface and the evaluation harness's standardised invocation protocol. The key design decisions are:

  • Keyword-argument unpacking: The harness calls predict_fn(**inputs_dict), so the function's parameter names must match the dictionary keys. This convention avoids positional-argument ambiguity and makes the data-to-function binding self-documenting.
  • Single-trace guarantee: Each call must produce exactly one trace. The harness detects whether the function is already instrumented by performing a no-op probe call; if no trace is emitted, it wraps the function with @mlflow.trace.
  • Async transparency: Async functions are detected via inspect.iscoroutinefunction and wrapped with asyncio.run(). A configurable timeout prevents runaway calls.

Pseudocode for the wrapping logic:

function prepare_predict_fn(fn, sample_input):
    if is_async(fn):
        fn = wrap_async_to_sync(fn, timeout)
    if not emits_trace(fn, sample_input):
        fn = add_tracing(fn)
    return lambda row: fn(**row)      # unpack inputs dict to kwargs

This layered wrapping ensures that regardless of how the user defines the function, the harness always receives a synchronous, traced, keyword-argument callable.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment