Implementation:Mlflow Mlflow Load Prompt
| Knowledge Sources | |
|---|---|
| Domains | ML_Ops, Prompt_Engineering |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Concrete tool for loading a versioned prompt from the MLflow Prompt Registry and formatting its template with variable substitution, provided by the MLflow library.
Description
The mlflow.genai.load_prompt() function retrieves a specific prompt version from the MLflow Prompt Registry. It supports multiple addressing modes: by name with an explicit version, by URI (prompts:/name/version), by alias URI (prompts:/name@alias), or by the special @latest alias. The function returns a PromptVersion entity whose format() method performs variable substitution to produce the final prompt text.
The load_prompt() function includes built-in caching with a configurable TTL. For alias-based loads, the default TTL is 60 seconds (controlled by the MLFLOW_ALIAS_PROMPT_CACHE_TTL_SECONDS environment variable). For version-based loads, there is no default TTL since specific versions are immutable. Setting cache_ttl_seconds=0 bypasses the cache entirely.
The PromptVersion.format() method handles variable substitution. For templates using double-brace syntax ({{var}}), it performs direct replacement. For templates using Jinja2 control flow (Template:% %), it invokes the Jinja2 rendering engine with optional sandboxing. When allow_partial=True, missing variables are preserved as placeholders in the returned PromptVersion rather than raising an error.
Usage
Use load_prompt() in application code to retrieve prompts at runtime. Use the format() method on the returned PromptVersion to substitute variables before passing the result to an LLM API.
Code Reference
Source Location
- Repository: mlflow
- File (load_prompt):
mlflow/genai/prompts/__init__.py - Lines: L155-218
- File (format):
mlflow/entities/model_registry/prompt_version.py - Lines: L450-573
Signature
# load_prompt
def load_prompt(
name_or_uri: str,
version: str | int | None = None,
allow_missing: bool = False,
link_to_model: bool = True,
model_id: str | None = None,
cache_ttl_seconds: float | None = None,
) -> PromptVersion:
...
# PromptVersion.format
def format(
self,
allow_partial: bool = False,
use_jinja_sandbox: bool = True,
**kwargs,
) -> PromptVersion | str | list[dict[str, Any]]:
...
Import
import mlflow.genai
# Then call:
# prompt = mlflow.genai.load_prompt(...)
# result = prompt.format(...)
I/O Contract
Inputs (load_prompt)
| Name | Type | Required | Description |
|---|---|---|---|
| name_or_uri | str | Yes | The prompt name (e.g., "my_prompt") or a URI (e.g., "prompts:/my_prompt/1", "prompts:/my_prompt@production", "prompts:/my_prompt@latest").
|
| version | int | None | No | The version number. Required when using a plain name; not allowed when using a URI that already includes the version or alias. |
| allow_missing | bool | No | If True, return None instead of raising an exception when the prompt is not found. Defaults to False. |
| link_to_model | bool | No | If True, link the prompt to the current model. Defaults to True. |
| model_id | None | No | The ID of the model to link the prompt to. Only used if link_to_model is True.
|
| cache_ttl_seconds | None | No | Time-to-live in seconds for the cached prompt. Defaults to 60s for alias-based loads, None (no TTL) for version-based loads. Set to 0 to bypass cache. |
Inputs (format)
| Name | Type | Required | Description |
|---|---|---|---|
| allow_partial | bool | No | If True, return a new PromptVersion with remaining placeholders when variables are missing. Defaults to False (raises error on missing variables). |
| use_jinja_sandbox | bool | No | If True, use Jinja2 SandboxedEnvironment for templates with control flow syntax. Defaults to True. |
| **kwargs | Any | No | Keyword arguments providing values for the template variables. |
Outputs
| Name | Type | Description |
|---|---|---|
| load_prompt return | PromptVersion | The loaded prompt version entity with template, metadata, variables, tags, and aliases. |
| format() return (complete) | list[dict[str, Any]] | For text prompts, a fully formatted string. For chat prompts, a list of formatted message dictionaries. |
| format() return (partial) | PromptVersion | When allow_partial=True and variables are missing, a new PromptVersion with the supplied variables filled and remaining placeholders intact.
|
Usage Examples
Load by Name and Version
import mlflow.genai
# Load a specific version
prompt = mlflow.genai.load_prompt("my_prompt", version=1)
result = prompt.format(style="friendly")
print(result) # Formatted text with {{style}} replaced by "friendly"
Load by URI with Alias
import mlflow.genai
# Load the production version via alias
prompt = mlflow.genai.load_prompt("prompts:/my_prompt@production")
# Load the latest version
prompt = mlflow.genai.load_prompt("prompts:/my_prompt@latest")
# Load a specific version by URI
prompt = mlflow.genai.load_prompt("prompts:/my_prompt/3")
Format and Use with LLM
import mlflow.genai
import openai
# Load and format
prompt = mlflow.genai.load_prompt("prompts:/greeting@production")
system_message = prompt.format(style="friendly")
# Use with OpenAI
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": "Hello!"},
],
)
Partial Formatting
import mlflow.genai
prompt = mlflow.genai.load_prompt("my_prompt", version=1)
# Fill system-level variables first
partial = prompt.format(system_context="You are helpful.", allow_partial=True)
# Fill user-specific variables later
final = partial.format(user_query="What is MLflow?")
Custom Cache TTL
import mlflow.genai
# Cache for 5 minutes
prompt = mlflow.genai.load_prompt(
"prompts:/my_prompt@production",
cache_ttl_seconds=300,
)
# Bypass cache entirely
prompt = mlflow.genai.load_prompt(
"prompts:/my_prompt@production",
cache_ttl_seconds=0,
)