Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Langgenius Dify Model Configuration

From Leeroopedia
Knowledge Sources Domains Last Updated
Dify LLM_Applications, Frontend, API 2026-02-12 00:00 GMT

Overview

Description

Model Configuration governs how a Dify application selects and parameterizes the underlying Large Language Model. After an application is created, the next critical step is defining which LLM provider and model will power it, along with the inference parameters that control the model's output behavior.

The configuration surface encompasses:

  • Provider Selection -- Choosing the LLM provider (e.g., OpenAI, Anthropic, Azure OpenAI, Tongyi, Spark, MiniMax, Replicate, or Hugging Face Hub).
  • Model Identification -- Specifying the exact model within the provider (e.g., gpt-3.5-turbo, gpt-4).
  • Model Mode -- Selecting between chat mode (conversational, message-based) and completion mode (single prompt-response), which determines the API contract with the provider.
  • Completion Parameters -- Fine-tuning the inference behavior through:
    • max_tokens -- The maximum number of tokens in the generated response.
    • temperature -- Controls randomness (0 = deterministic, 2 = highly random).
    • top_p -- Nucleus sampling threshold; alternative to temperature for controlling diversity.
    • presence_penalty -- Discourages the model from repeating topics (range: -2.0 to 2.0).
    • frequency_penalty -- Discourages the model from repeating specific tokens (range: -2.0 to 2.0).
    • stop -- Up to 4 stop sequences that terminate generation.

The model configuration is persisted as part of the application's ModelConfig and can be updated at any time during the development cycle through the updateAppModelConfig service function.

Usage

Model configuration is used whenever a developer needs to:

  • Select or change the LLM provider and model for an application.
  • Tune generation quality by adjusting temperature, top_p, or penalty parameters.
  • Control response length via max_tokens.
  • Switch between chat and completion model modes to match the application's interaction pattern.
  • Optimize cost-performance tradeoffs by selecting different model tiers.

Theoretical Basis

Model Configuration applies the Strategy Pattern: the application delegates its text generation behavior to an interchangeable model strategy defined by the provider, model_id, and mode triple. This decouples the application logic from any specific LLM, enabling portability across providers.

The completion parameters draw from established sampling theory in language model inference:

  • Temperature scaling modifies the softmax distribution over the vocabulary, where lower values sharpen the distribution (favoring high-probability tokens) and higher values flatten it (increasing diversity).
  • Nucleus sampling (top_p) truncates the probability distribution to the smallest set of tokens whose cumulative probability exceeds the threshold, offering a more principled alternative to temperature for controlling output diversity.
  • Presence and frequency penalties implement repetition control mechanisms that modify token log-probabilities based on whether (presence) or how often (frequency) tokens have appeared in the generated text.

From an architectural standpoint, the updateAppModelConfig endpoint accepts a generic Record<string, any> body to accommodate the varying parameter schemas across different LLM providers, following the Tolerant Reader pattern.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment