Principle:FlowiseAI Flowise Vector Store Upsert
| Attribute | Value |
|---|---|
| Sources | packages/ui/src/api/documentstore.js |
| Domains | Document_Store_Ingestion |
| Last Updated | 2026-02-12 14:00 GMT |
Overview
Vector_Store_Upsert is a technique for upserting processed document chunks with their embeddings into a configured vector store for similarity search retrieval. This operation transforms text chunks into searchable vector representations, completing the document ingestion pipeline.
Description
The upsert operation takes stored chunks, generates embeddings using the configured embedding provider, and inserts/updates them in the vector store. The operation supports incremental updates via record managers that track which documents have been processed, enabling efficient re-indexing.
The upsert workflow encompasses several coordinated steps:
- Embedding generation -- Each chunk's
pageContentis passed through the configured embedding provider (e.g., OpenAI, HuggingFace) to produce a dense vector representation. The vector dimension depends on the embedding model (e.g., 1536 for OpenAI ada-002, 768 for many open-source models). - Vector storage -- The generated embeddings, along with the chunk's text content and metadata, are written to the configured vector store. The "upsert" semantics mean existing vectors for the same document are updated rather than duplicated.
- Record management (optional) -- When a record manager is configured, the system tracks document hashes to determine which chunks are new, modified, or unchanged. This enables incremental updates that skip unchanged chunks, significantly reducing processing time and cost for large document sets.
- Result reporting -- The operation returns counts of added, updated, skipped, and deleted records, providing transparency into what changed during the upsert.
The upsert can target the entire store or a specific document within the store, controlled by the optional docId parameter.
Usage
Use vector store upsert when indexing document chunks into a vector store for RAG retrieval. Typical scenarios include:
- Initial indexing -- Upserting all chunks from a newly processed document store for the first time.
- Incremental updates -- Re-upserting after adding new documents or re-processing existing ones. The record manager ensures only changed chunks are re-embedded.
- Single document re-index -- Upserting chunks for a specific document after editing its chunks, using the
docIdparameter. - Provider migration -- Re-upserting all chunks to a new vector store after changing providers.
// Upsert all chunks from a document store
const result = await documentStoreApi.insertIntoVectorStore({
storeId: 'store-123',
isStrictSave: true,
embeddingName: 'openAIEmbeddings',
embeddingConfig: { modelName: 'text-embedding-ada-002', credential: 'cred-abc' },
vectorStoreName: 'pinecone',
vectorStoreConfig: { index: 'my-index', namespace: 'docs' },
recordManagerName: 'postgresRecordManager',
recordManagerConfig: { tableName: 'records', cleanup: 'incremental' }
})
console.log(`Added: ${result.data.numAdded}, Updated: ${result.data.numUpdated}`)
Theoretical Basis
Vector store upsert implements an upsert (insert-or-update) pattern with embedding generation:
- Dense vector representation -- Each chunk's text is transformed into a dense vector in a high-dimensional space (typically 768-1536 dimensions) by the embedding model. Semantically similar texts produce vectors that are close together in this space, enabling similarity search via distance metrics (cosine similarity, dot product, Euclidean distance).
- Upsert semantics -- The "upsert" operation combines insert and update: if a vector with the same identifier already exists, it is updated in place; otherwise, a new vector is inserted. This idempotent behavior is critical for re-indexing workflows where the same document may be processed multiple times.
- Record manager for incremental updates -- The record manager maintains a hash of each chunk's content and metadata. On subsequent upserts, it compares current hashes with stored hashes to determine which chunks are new (insert), modified (update), unchanged (skip), or no longer present (delete). This optimization can reduce upsert time by orders of magnitude for large, slowly-changing document sets.
- Strict save mode -- The
isStrictSaveparameter controls whether the vector store configuration is persisted alongside the upsert operation. When true, the configuration is saved for future reference and re-use. - Batch processing -- Vector stores typically support batch operations, and the upsert sends chunks in batches to optimize throughput. The batch size and concurrency depend on the specific vector store provider's capabilities and rate limits.
The quality of embeddings directly determines retrieval quality. The embedding model's training data, dimensionality, and architecture all influence how well semantic similarity is captured for the specific domain of the ingested documents.
Related Pages
- Implementation:FlowiseAI_Flowise_InsertIntoVectorStore
- Heuristic:FlowiseAI_Flowise_Document_Loader_Bypass_Optimization
- Principle:FlowiseAI_Flowise_Vector_Store_Provider_Configuration -- Previous step: configuring the embedding and vector store providers
- Principle:FlowiseAI_Flowise_Chunk_Management -- Ensuring chunk quality before upsert
- Principle:FlowiseAI_Flowise_Vector_Store_Query -- Next step: testing retrieval quality after upsert