Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Explodinggradients Ragas Prompt Persistence

From Leeroopedia


Prompt Persistence

Prompt Persistence is a principle in the Ragas evaluation toolkit that enables serialization and deserialization of evaluation metric prompts for reuse, sharing, and deployment.

Motivation

Optimizing evaluation metric prompts (via Genetic Prompt Optimization or DSPy Prompt Optimization) is a computationally expensive process that involves multiple LLM calls and human annotations. Once an optimized prompt has been found, it should be saved to disk so that it can be reloaded without re-running the optimization. Prompt persistence also enables:

  • Version control -- Prompts can be committed to a repository and tracked alongside code changes.
  • Sharing -- Optimized prompts can be distributed to team members or across environments.
  • Deployment -- Production evaluation pipelines can load pre-optimized prompts at startup.
  • Language adaptation -- Prompts adapted to different languages can be saved and loaded by language tag.

Theoretical Foundation

Decoupling Optimization from Evaluation

Prompt persistence creates a clean separation between two phases of the metric lifecycle:

  1. Optimization phase -- Run once (or periodically) to discover the best prompt for a metric. This phase requires annotated data, an optimizer, and significant LLM budget.
  2. Evaluation phase -- Run repeatedly in production or CI/CD pipelines. This phase only needs the metric with its pre-loaded optimized prompt and the LLM for scoring.

By persisting the output of phase 1, phase 2 can operate independently and reproducibly.

Prompt as Configuration

In Ragas, prompts are instances of PydanticPrompt, which are Pydantic models containing an instruction string, input/output schemas, and optional few-shot examples. Treating prompts as serializable configuration objects (saved as JSON files) allows them to be managed with the same tools used for application configuration.

File Naming Convention

Prompts are saved with a naming convention that encodes the metric name, prompt name, and language:

  • With metric name: {metric_name}_{prompt_name}_{language}.json
  • Without metric name: {prompt_name}_{language}.json

This convention allows multiple prompts for different metrics and languages to coexist in the same directory without collision.

Operations

The prompt persistence interface provides four core operations:

get_prompts

Retrieves all prompts associated with a metric as a dictionary mapping prompt names to PydanticPrompt instances. This is the discovery mechanism: callers can inspect which prompts a metric uses before saving or modifying them.

set_prompts

Replaces one or more prompts on a metric by name. Validates that the provided prompt names exist and that the values are PydanticPrompt instances. This is the primary mechanism for applying optimized prompts to a metric.

save_prompts

Serializes all prompts to individual JSON files in a specified directory. Each prompt is saved as a separate file following the naming convention above. The directory must already exist.

load_prompts

Deserializes prompts from JSON files in a specified directory. If no language is specified, defaults to English. Returns a dictionary of loaded PydanticPrompt instances that can be passed to set_prompts().

Workflow

A typical prompt persistence workflow proceeds as follows:

  1. Optimize -- Use a GeneticOptimizer or DSPyOptimizer to find the best prompt instructions.
  2. Apply -- Set the optimized instructions on the metric's prompts.
  3. Save -- Call metric.save_prompts("./prompts/") to persist to disk.
  4. Load -- In a new session or environment, call metric.load_prompts("./prompts/") to restore.
  5. Set -- Apply the loaded prompts with metric.set_prompts(**loaded_prompts).

Language Support

Prompts can be adapted to different languages using adapt_prompts(), then saved with the language-specific filename. When loading, the language parameter selects which version to load. This enables multilingual evaluation pipelines where the same metric logic operates in different languages with localized prompts.

Implemented By

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment