Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:FlowiseAI Flowise Vector Store Upsert

From Leeroopedia
Attribute Value
Sources packages/ui/src/api/documentstore.js
Domains Document_Store_Ingestion
Last Updated 2026-02-12 14:00 GMT

Overview

Vector_Store_Upsert is a technique for upserting processed document chunks with their embeddings into a configured vector store for similarity search retrieval. This operation transforms text chunks into searchable vector representations, completing the document ingestion pipeline.

Description

The upsert operation takes stored chunks, generates embeddings using the configured embedding provider, and inserts/updates them in the vector store. The operation supports incremental updates via record managers that track which documents have been processed, enabling efficient re-indexing.

The upsert workflow encompasses several coordinated steps:

  • Embedding generation -- Each chunk's pageContent is passed through the configured embedding provider (e.g., OpenAI, HuggingFace) to produce a dense vector representation. The vector dimension depends on the embedding model (e.g., 1536 for OpenAI ada-002, 768 for many open-source models).
  • Vector storage -- The generated embeddings, along with the chunk's text content and metadata, are written to the configured vector store. The "upsert" semantics mean existing vectors for the same document are updated rather than duplicated.
  • Record management (optional) -- When a record manager is configured, the system tracks document hashes to determine which chunks are new, modified, or unchanged. This enables incremental updates that skip unchanged chunks, significantly reducing processing time and cost for large document sets.
  • Result reporting -- The operation returns counts of added, updated, skipped, and deleted records, providing transparency into what changed during the upsert.

The upsert can target the entire store or a specific document within the store, controlled by the optional docId parameter.

Usage

Use vector store upsert when indexing document chunks into a vector store for RAG retrieval. Typical scenarios include:

  • Initial indexing -- Upserting all chunks from a newly processed document store for the first time.
  • Incremental updates -- Re-upserting after adding new documents or re-processing existing ones. The record manager ensures only changed chunks are re-embedded.
  • Single document re-index -- Upserting chunks for a specific document after editing its chunks, using the docId parameter.
  • Provider migration -- Re-upserting all chunks to a new vector store after changing providers.
// Upsert all chunks from a document store
const result = await documentStoreApi.insertIntoVectorStore({
    storeId: 'store-123',
    isStrictSave: true,
    embeddingName: 'openAIEmbeddings',
    embeddingConfig: { modelName: 'text-embedding-ada-002', credential: 'cred-abc' },
    vectorStoreName: 'pinecone',
    vectorStoreConfig: { index: 'my-index', namespace: 'docs' },
    recordManagerName: 'postgresRecordManager',
    recordManagerConfig: { tableName: 'records', cleanup: 'incremental' }
})
console.log(`Added: ${result.data.numAdded}, Updated: ${result.data.numUpdated}`)

Theoretical Basis

Vector store upsert implements an upsert (insert-or-update) pattern with embedding generation:

  • Dense vector representation -- Each chunk's text is transformed into a dense vector in a high-dimensional space (typically 768-1536 dimensions) by the embedding model. Semantically similar texts produce vectors that are close together in this space, enabling similarity search via distance metrics (cosine similarity, dot product, Euclidean distance).
  • Upsert semantics -- The "upsert" operation combines insert and update: if a vector with the same identifier already exists, it is updated in place; otherwise, a new vector is inserted. This idempotent behavior is critical for re-indexing workflows where the same document may be processed multiple times.
  • Record manager for incremental updates -- The record manager maintains a hash of each chunk's content and metadata. On subsequent upserts, it compares current hashes with stored hashes to determine which chunks are new (insert), modified (update), unchanged (skip), or no longer present (delete). This optimization can reduce upsert time by orders of magnitude for large, slowly-changing document sets.
  • Strict save mode -- The isStrictSave parameter controls whether the vector store configuration is persisted alongside the upsert operation. When true, the configuration is saved for future reference and re-use.
  • Batch processing -- Vector stores typically support batch operations, and the upsert sends chunks in batches to optimize throughput. The batch size and concurrency depend on the specific vector store provider's capabilities and rate limits.

The quality of embeddings directly determines retrieval quality. The embedding model's training data, dimensionality, and architecture all influence how well semantic similarity is captured for the specific domain of the ingested documents.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment