Principle:Explodinggradients Ragas Prompt Persistence
Prompt Persistence
Prompt Persistence is a principle in the Ragas evaluation toolkit that enables serialization and deserialization of evaluation metric prompts for reuse, sharing, and deployment.
Motivation
Optimizing evaluation metric prompts (via Genetic Prompt Optimization or DSPy Prompt Optimization) is a computationally expensive process that involves multiple LLM calls and human annotations. Once an optimized prompt has been found, it should be saved to disk so that it can be reloaded without re-running the optimization. Prompt persistence also enables:
- Version control -- Prompts can be committed to a repository and tracked alongside code changes.
- Sharing -- Optimized prompts can be distributed to team members or across environments.
- Deployment -- Production evaluation pipelines can load pre-optimized prompts at startup.
- Language adaptation -- Prompts adapted to different languages can be saved and loaded by language tag.
Theoretical Foundation
Decoupling Optimization from Evaluation
Prompt persistence creates a clean separation between two phases of the metric lifecycle:
- Optimization phase -- Run once (or periodically) to discover the best prompt for a metric. This phase requires annotated data, an optimizer, and significant LLM budget.
- Evaluation phase -- Run repeatedly in production or CI/CD pipelines. This phase only needs the metric with its pre-loaded optimized prompt and the LLM for scoring.
By persisting the output of phase 1, phase 2 can operate independently and reproducibly.
Prompt as Configuration
In Ragas, prompts are instances of PydanticPrompt, which are Pydantic models containing an instruction string, input/output schemas, and optional few-shot examples. Treating prompts as serializable configuration objects (saved as JSON files) allows them to be managed with the same tools used for application configuration.
File Naming Convention
Prompts are saved with a naming convention that encodes the metric name, prompt name, and language:
- With metric name:
{metric_name}_{prompt_name}_{language}.json - Without metric name:
{prompt_name}_{language}.json
This convention allows multiple prompts for different metrics and languages to coexist in the same directory without collision.
Operations
The prompt persistence interface provides four core operations:
get_prompts
Retrieves all prompts associated with a metric as a dictionary mapping prompt names to PydanticPrompt instances. This is the discovery mechanism: callers can inspect which prompts a metric uses before saving or modifying them.
set_prompts
Replaces one or more prompts on a metric by name. Validates that the provided prompt names exist and that the values are PydanticPrompt instances. This is the primary mechanism for applying optimized prompts to a metric.
save_prompts
Serializes all prompts to individual JSON files in a specified directory. Each prompt is saved as a separate file following the naming convention above. The directory must already exist.
load_prompts
Deserializes prompts from JSON files in a specified directory. If no language is specified, defaults to English. Returns a dictionary of loaded PydanticPrompt instances that can be passed to set_prompts().
Workflow
A typical prompt persistence workflow proceeds as follows:
- Optimize -- Use a
GeneticOptimizerorDSPyOptimizerto find the best prompt instructions. - Apply -- Set the optimized instructions on the metric's prompts.
- Save -- Call
metric.save_prompts("./prompts/")to persist to disk. - Load -- In a new session or environment, call
metric.load_prompts("./prompts/")to restore. - Set -- Apply the loaded prompts with
metric.set_prompts(**loaded_prompts).
Language Support
Prompts can be adapted to different languages using adapt_prompts(), then saved with the language-specific filename. When loading, the language parameter selects which version to load. This enables multilingual evaluation pipelines where the same metric logic operates in different languages with localized prompts.
Implemented By
See Also
- Implementation:Explodinggradients_Ragas_PromptMixin_Save_Load
- Genetic Prompt Optimization -- Produces prompts that benefit from persistence.
- DSPy Prompt Optimization -- Alternative optimization whose results should be persisted.
- Human Annotation Collection -- Source data for optimization; persistence avoids re-optimization.