Implementation:Mlflow Mlflow Load Prompt

Knowledge Sources	MLflow MLflow Prompt API
Domains	ML_Ops, Prompt_Engineering
Last Updated	2026-02-13 20:00 GMT

Overview

Concrete tool for loading a versioned prompt from the MLflow Prompt Registry and formatting its template with variable substitution, provided by the MLflow library.

Description

The mlflow.genai.load_prompt() function retrieves a specific prompt version from the MLflow Prompt Registry. It supports multiple addressing modes: by name with an explicit version, by URI (prompts:/name/version), by alias URI (prompts:/name@alias), or by the special @latest alias. The function returns a PromptVersion entity whose format() method performs variable substitution to produce the final prompt text.

The load_prompt() function includes built-in caching with a configurable TTL. For alias-based loads, the default TTL is 60 seconds (controlled by the MLFLOW_ALIAS_PROMPT_CACHE_TTL_SECONDS environment variable). For version-based loads, there is no default TTL since specific versions are immutable. Setting cache_ttl_seconds=0 bypasses the cache entirely.

The PromptVersion.format() method handles variable substitution. For templates using double-brace syntax ({{var}}), it performs direct replacement. For templates using Jinja2 control flow (Template:% %), it invokes the Jinja2 rendering engine with optional sandboxing. When allow_partial=True, missing variables are preserved as placeholders in the returned PromptVersion rather than raising an error.

Usage

Use load_prompt() in application code to retrieve prompts at runtime. Use the format() method on the returned PromptVersion to substitute variables before passing the result to an LLM API.

Code Reference

Source Location

Repository: mlflow
File (load_prompt): mlflow/genai/prompts/__init__.py
Lines: L155-218
File (format): mlflow/entities/model_registry/prompt_version.py
Lines: L450-573

Signature

# load_prompt
def load_prompt(
    name_or_uri: str,
    version: str | int | None = None,
    allow_missing: bool = False,
    link_to_model: bool = True,
    model_id: str | None = None,
    cache_ttl_seconds: float | None = None,
) -> PromptVersion:
    ...

# PromptVersion.format
def format(
    self,
    allow_partial: bool = False,
    use_jinja_sandbox: bool = True,
    **kwargs,
) -> PromptVersion | str | list[dict[str, Any]]:
    ...

Import

import mlflow.genai

# Then call:
# prompt = mlflow.genai.load_prompt(...)
# result = prompt.format(...)

I/O Contract

Inputs (load_prompt)

Name	Type	Required	Description
name_or_uri	str	Yes	The prompt name (e.g., `"my_prompt"`) or a URI (e.g., `"prompts:/my_prompt/1"`, `"prompts:/my_prompt@production"`, `"prompts:/my_prompt@latest"`).
version	int \| None	No	The version number. Required when using a plain name; not allowed when using a URI that already includes the version or alias.
allow_missing	bool	No	If True, return None instead of raising an exception when the prompt is not found. Defaults to False.
link_to_model	bool	No	If True, link the prompt to the current model. Defaults to True.
model_id	None	No	The ID of the model to link the prompt to. Only used if `link_to_model` is True.
cache_ttl_seconds	None	No	Time-to-live in seconds for the cached prompt. Defaults to 60s for alias-based loads, None (no TTL) for version-based loads. Set to 0 to bypass cache.

Inputs (format)

Name	Type	Required	Description
allow_partial	bool	No	If True, return a new PromptVersion with remaining placeholders when variables are missing. Defaults to False (raises error on missing variables).
use_jinja_sandbox	bool	No	If True, use Jinja2 SandboxedEnvironment for templates with control flow syntax. Defaults to True.
**kwargs	Any	No	Keyword arguments providing values for the template variables.

Outputs

Name	Type	Description
load_prompt return	PromptVersion	The loaded prompt version entity with template, metadata, variables, tags, and aliases.
format() return (complete)	list[dict[str, Any]]	For text prompts, a fully formatted string. For chat prompts, a list of formatted message dictionaries.
format() return (partial)	PromptVersion	When `allow_partial=True` and variables are missing, a new PromptVersion with the supplied variables filled and remaining placeholders intact.

Usage Examples

Load by Name and Version

import mlflow.genai

# Load a specific version
prompt = mlflow.genai.load_prompt("my_prompt", version=1)
result = prompt.format(style="friendly")
print(result)  # Formatted text with {{style}} replaced by "friendly"

Load by URI with Alias

import mlflow.genai

# Load the production version via alias
prompt = mlflow.genai.load_prompt("prompts:/my_prompt@production")

# Load the latest version
prompt = mlflow.genai.load_prompt("prompts:/my_prompt@latest")

# Load a specific version by URI
prompt = mlflow.genai.load_prompt("prompts:/my_prompt/3")

Format and Use with LLM

import mlflow.genai
import openai

# Load and format
prompt = mlflow.genai.load_prompt("prompts:/greeting@production")
system_message = prompt.format(style="friendly")

# Use with OpenAI
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": "Hello!"},
    ],
)

Partial Formatting

import mlflow.genai

prompt = mlflow.genai.load_prompt("my_prompt", version=1)

# Fill system-level variables first
partial = prompt.format(system_context="You are helpful.", allow_partial=True)

# Fill user-specific variables later
final = partial.format(user_query="What is MLflow?")

Custom Cache TTL

import mlflow.genai

# Cache for 5 minutes
prompt = mlflow.genai.load_prompt(
    "prompts:/my_prompt@production",
    cache_ttl_seconds=300,
)

# Bypass cache entirely
prompt = mlflow.genai.load_prompt(
    "prompts:/my_prompt@production",
    cache_ttl_seconds=0,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment