Principle:Langgenius Dify Model and Prompt Configuration
| Knowledge Sources | |
|---|---|
| Domains | Prompt Engineering LLM Configuration |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
LLM model selection and prompt engineering configuration defines how an AI application processes user input and generates responses by specifying the model provider, prompt templates, variable bindings, and completion parameters.
Description
Model and prompt configuration is the central nervous system of any LLM application. It governs three interrelated concerns:
1. Model Selection
The developer chooses a model provider (e.g., OpenAI, Anthropic, Azure OpenAI, local models) and a specific model ID (e.g., gpt-4, claude-3-opus) along with the model mode (chat or completion). This triple determines the API contract, token limits, pricing, and capability set available to the application.
2. Prompt Template Design
The prompt template is the instruction set that shapes the model's behavior. Dify supports two prompt modes:
- Simple mode -- A single prompt template string with variable placeholders (e.g.,
Template:User name,Template:Context). Variables are defined as a list of typed parameters that the end user fills in at runtime. - Advanced mode -- Separate system/user/assistant message configurations for chat models, or a structured completion prompt with prefix/suffix for completion models. Advanced mode also supports conversation history injection and context block placement.
Prompt variables define the dynamic inputs:
| Variable Property | Description |
|---|---|
| key | Variable identifier used in the template |
| name | Display label shown to end users |
| type | Data type (string, number, select) |
| required | Whether the variable must be provided |
| options | Available choices (for select type) |
3. Completion Parameters
These parameters control the stochastic behavior of the model's generation:
- temperature -- Controls randomness (0 = deterministic, 2 = highly creative)
- top_p -- Nucleus sampling threshold
- max_tokens -- Maximum length of the generated response
- presence_penalty -- Penalizes token repetition based on presence in the output so far
- frequency_penalty -- Penalizes token repetition based on frequency in the output so far
Usage
Configure model and prompt settings when:
- Setting up a new application after creation
- Tuning the quality, style, or cost of generated responses
- Switching between model providers for A/B testing or cost optimization
- Adding or modifying input variables that end users provide
Theoretical Basis
The configuration follows a layered abstraction pattern:
Layer 1: Provider Selection -> WHO generates
Layer 2: Prompt Template -> WHAT instructions to follow
Layer 3: Variable Injection -> WITH what dynamic context
Layer 4: Completion Parameters -> HOW to generate (sampling strategy)
The relationship between these layers can be expressed as:
FUNCTION generate_response(user_input, config):
model = RESOLVE_MODEL(config.provider, config.model_id)
prompt = RENDER_TEMPLATE(config.prompt_template, config.prompt_variables, user_input)
response = model.generate(
prompt,
temperature = config.completion_params.temperature,
top_p = config.completion_params.top_p,
max_tokens = config.completion_params.max_tokens,
presence_penalty = config.completion_params.presence_penalty,
frequency_penalty = config.completion_params.frequency_penalty
)
RETURN response
Key trade-offs:
- Temperature vs. reliability -- Higher temperature produces more creative but less predictable outputs. Production applications typically use lower values (0.1-0.7).
- Max tokens vs. cost -- Higher max_tokens allows longer responses but increases API costs proportionally.
- Simple vs. advanced prompt mode -- Simple mode is easier to configure but offers less control over multi-turn behavior and system instructions.