Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Langgenius Dify Vector Database Selection

From Leeroopedia
Knowledge Sources Dify
Domains DevOps, Deployment, VectorDB, RAG
Last Updated 2026-02-12 00:00 GMT

Overview

Vector Database Selection is the principle of decoupling the vector storage backend from the application layer so operators can choose the most appropriate vector database through a single environment variable.

Description

Dify's RAG (Retrieval-Augmented Generation) pipeline requires a vector database to store and retrieve document embeddings. Rather than hard-coding a single vector store, Dify abstracts the choice behind the VECTOR_STORE environment variable. This variable simultaneously controls two concerns:

  1. Application routing -- The API and worker services read VECTOR_STORE at runtime to instantiate the correct vector store client (Weaviate, Qdrant, Milvus, pgvector, etc.).
  2. Infrastructure provisioning -- Docker Compose profiles use the same variable to start only the container(s) needed for the selected vector database, avoiding resource waste.

The supported vector databases include:

  • weaviate (default) -- Weaviate v1.27, with gRPC support and API-key authentication
  • qdrant -- Qdrant v1.8, lightweight and high-performance
  • milvus -- Milvus v2.6 with etcd and MinIO dependencies
  • pgvector -- PostgreSQL 16 with the pgvector extension
  • pgvecto-rs -- PostgreSQL 16 with the pgvecto.rs extension (Rust-based)
  • chroma -- ChromaDB v0.5 with token authentication
  • opensearch -- OpenSearch with dashboards
  • oceanbase -- OceanBase CE v4.3 with vector support
  • elasticsearch -- Elasticsearch v8.14
  • oracle -- Oracle Free with vector capabilities
  • opengauss -- openGauss v7.0 with vector support
  • myscale -- MyScaleDB v1.6 (ClickHouse-based)
  • And additional options: couchbase, vastbase, matrixone, iris, seekdb

This principle ensures that switching vector databases requires no code changes and no modification of the Docker Compose file -- only an update to the VECTOR_STORE value in .env.

Usage

Use this principle when:

  • Choosing a vector database for a new Dify deployment.
  • Migrating from one vector database to another.
  • Evaluating different vector stores for performance benchmarking.
  • Running minimal infrastructure in development versus full-featured stores in production.

Theoretical Basis

The selection mechanism relies on Docker Compose profiles, a feature that conditionally starts services based on the active profile list. The COMPOSE_PROFILES variable is computed from the vector store and database type:

# In .env.example:
COMPOSE_PROFILES=${VECTOR_STORE:-weaviate},${DB_TYPE:-postgresql}

When docker compose up runs, only services whose profiles list includes a value present in COMPOSE_PROFILES are started. Services without a profiles key (api, worker, web, redis, nginx, etc.) always start.

Pseudocode: Profile-based service activation

1. Operator sets VECTOR_STORE=qdrant in .env
2. COMPOSE_PROFILES resolves to "qdrant,postgresql"
3. docker compose evaluates each service:
   - api (no profile)        -> STARTS (always)
   - worker (no profile)     -> STARTS (always)
   - redis (no profile)      -> STARTS (always)
   - weaviate (profile: weaviate) -> SKIPPED
   - qdrant (profile: qdrant)     -> STARTS (matches)
   - milvus (profile: milvus)     -> SKIPPED
   - db_postgres (profile: postgresql) -> STARTS (matches)
   - db_mysql (profile: mysql)    -> SKIPPED
4. Only qdrant + postgres containers launched alongside core services

Each vector database service in docker-compose.yaml declares its own profile that matches the VECTOR_STORE value. Some databases require multiple supporting containers (e.g., Milvus needs etcd and MinIO), all sharing the same profile tag.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment