Principle:Langgenius Dify Vector Database Selection
| Knowledge Sources | Dify |
|---|---|
| Domains | DevOps, Deployment, VectorDB, RAG |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Vector Database Selection is the principle of decoupling the vector storage backend from the application layer so operators can choose the most appropriate vector database through a single environment variable.
Description
Dify's RAG (Retrieval-Augmented Generation) pipeline requires a vector database to store and retrieve document embeddings. Rather than hard-coding a single vector store, Dify abstracts the choice behind the VECTOR_STORE environment variable. This variable simultaneously controls two concerns:
- Application routing -- The API and worker services read
VECTOR_STOREat runtime to instantiate the correct vector store client (Weaviate, Qdrant, Milvus, pgvector, etc.). - Infrastructure provisioning -- Docker Compose profiles use the same variable to start only the container(s) needed for the selected vector database, avoiding resource waste.
The supported vector databases include:
- weaviate (default) -- Weaviate v1.27, with gRPC support and API-key authentication
- qdrant -- Qdrant v1.8, lightweight and high-performance
- milvus -- Milvus v2.6 with etcd and MinIO dependencies
- pgvector -- PostgreSQL 16 with the pgvector extension
- pgvecto-rs -- PostgreSQL 16 with the pgvecto.rs extension (Rust-based)
- chroma -- ChromaDB v0.5 with token authentication
- opensearch -- OpenSearch with dashboards
- oceanbase -- OceanBase CE v4.3 with vector support
- elasticsearch -- Elasticsearch v8.14
- oracle -- Oracle Free with vector capabilities
- opengauss -- openGauss v7.0 with vector support
- myscale -- MyScaleDB v1.6 (ClickHouse-based)
- And additional options: couchbase, vastbase, matrixone, iris, seekdb
This principle ensures that switching vector databases requires no code changes and no modification of the Docker Compose file -- only an update to the VECTOR_STORE value in .env.
Usage
Use this principle when:
- Choosing a vector database for a new Dify deployment.
- Migrating from one vector database to another.
- Evaluating different vector stores for performance benchmarking.
- Running minimal infrastructure in development versus full-featured stores in production.
Theoretical Basis
The selection mechanism relies on Docker Compose profiles, a feature that conditionally starts services based on the active profile list. The COMPOSE_PROFILES variable is computed from the vector store and database type:
# In .env.example:
COMPOSE_PROFILES=${VECTOR_STORE:-weaviate},${DB_TYPE:-postgresql}
When docker compose up runs, only services whose profiles list includes a value present in COMPOSE_PROFILES are started. Services without a profiles key (api, worker, web, redis, nginx, etc.) always start.
Pseudocode: Profile-based service activation
1. Operator sets VECTOR_STORE=qdrant in .env
2. COMPOSE_PROFILES resolves to "qdrant,postgresql"
3. docker compose evaluates each service:
- api (no profile) -> STARTS (always)
- worker (no profile) -> STARTS (always)
- redis (no profile) -> STARTS (always)
- weaviate (profile: weaviate) -> SKIPPED
- qdrant (profile: qdrant) -> STARTS (matches)
- milvus (profile: milvus) -> SKIPPED
- db_postgres (profile: postgresql) -> STARTS (matches)
- db_mysql (profile: mysql) -> SKIPPED
4. Only qdrant + postgres containers launched alongside core services
Each vector database service in docker-compose.yaml declares its own profile that matches the VECTOR_STORE value. Some databases require multiple supporting containers (e.g., Milvus needs etcd and MinIO), all sharing the same profile tag.