Principle:Run llama Llama index Vector Index Construction
Overview
Vector Index Construction is the principle of transforming textual document chunks into dense numerical vectors and organizing them in a data structure optimized for similarity-based retrieval. This is the core mechanism that enables Retrieval-Augmented Generation (RAG) systems to find contextually relevant passages given a natural language query, without relying on exact keyword matching.
In LlamaIndex, vector index construction encompasses the entire pipeline from receiving processed nodes (document chunks) through embedding generation to storage in a vector store that supports efficient nearest neighbor search.
Vector Search RAG Pipeline Information Retrieval LlamaIndex Core
Vector Similarity Search as the Foundation for Retrieval
Traditional information retrieval systems rely on lexical matching -- finding documents that share exact terms with the query (e.g., TF-IDF, BM25). While effective for keyword-oriented queries, these methods fail when the user and the document express the same concept using different words. For example, a query for "car repair" would not match a document about "automobile maintenance" in a purely lexical system.
Vector similarity search solves this by operating in semantic space rather than lexical space:
- Each text passage is converted into a dense vector (an array of floating-point numbers, typically 256-3072 dimensions) using an embedding model.
- The embedding model is trained such that texts with similar meanings produce vectors that are close together in the vector space.
- At query time, the query is also embedded, and the system finds stored vectors that are nearest to the query vector.
This approach captures semantic similarity: "car repair" and "automobile maintenance" will have nearby vectors because the embedding model has learned that these phrases are conceptually related.
Embedding Documents into Vector Space
What Are Embeddings?
An embedding is a learned mapping from variable-length text to a fixed-length dense vector:
embed: text -> R^d
where d is the dimensionality of the embedding space (e.g., 1536 for OpenAI's text-embedding-ada-002, 384 for all-MiniLM-L6-v2).
Key properties of well-trained embeddings:
- Semantic proximity: Similar texts produce nearby vectors.
- Compositionality: Embeddings capture relationships between concepts, not just individual words.
- Cross-lingual potential: Multilingual embedding models can place equivalent texts in different languages near each other.
The Embedding Process in Index Construction
During vector index construction, the system performs the following steps:
- Receive nodes: Document chunks (nodes) arrive from the ingestion pipeline, each with text content and metadata.
- Batch embedding: The text of each node is passed through the embedding model, typically in batches for efficiency.
- Vector storage: Each node's embedding vector, along with its text and metadata, is stored in the vector store.
- Index structure: The vector store organizes the vectors for efficient retrieval (flat storage, tree-based, graph-based, or quantized depending on the backend).
Approximate Nearest Neighbor (ANN) Search
For small collections (thousands of vectors), exact nearest neighbor search via brute-force comparison is feasible. For larger collections (millions to billions), exact search becomes impractical, and Approximate Nearest Neighbor (ANN) algorithms are used.
ANN algorithms trade a small amount of accuracy for dramatic speed improvements:
| Algorithm Family | Examples | Approach | Trade-off |
|---|---|---|---|
| Tree-based | Annoy, KD-trees | Partition the vector space into regions using tree structures | Fast for low dimensions; degrades in high dimensions |
| Hash-based | LSH (Locality-Sensitive Hashing) | Map similar vectors to the same hash bucket | Simple but less accurate than graph-based methods |
| Graph-based | HNSW (Hierarchical Navigable Small World) | Build a navigable graph connecting nearby vectors | Excellent recall-speed trade-off; widely used in production |
| Quantization-based | PQ (Product Quantization), IVF | Compress vectors and search in compressed space | Reduces memory footprint significantly |
LlamaIndex's default in-memory vector store uses flat (exact) search, which is suitable for prototyping. For production workloads, LlamaIndex integrates with dedicated vector databases (Pinecone, Weaviate, Qdrant, Chroma, Milvus, and others) that implement ANN algorithms.
Distance Metrics
The choice of distance metric determines how "closeness" between vectors is measured:
Cosine Similarity
Measures the angle between two vectors, ignoring magnitude:
cosine_sim(a, b) = (a . b) / (||a|| * ||b||)
- Range: [-1, 1] (1 = identical direction, 0 = orthogonal, -1 = opposite)
- Most commonly used for text embeddings because it normalizes for document length.
- Equivalent to dot product when vectors are L2-normalized.
Dot Product (Inner Product)
Measures both directional similarity and magnitude:
dot(a, b) = sum(a_i * b_i)
- Range: unbounded
- Useful when vector magnitude carries meaning (e.g., document importance).
- For normalized vectors, equivalent to cosine similarity.
Euclidean Distance (L2)
Measures the straight-line distance between two points in vector space:
L2(a, b) = sqrt(sum((a_i - b_i)^2))
- Range: [0, infinity) (0 = identical)
- Less commonly used for text retrieval but important in some specialized applications.
| Metric | Best For | Normalization Required | LlamaIndex Default |
|---|---|---|---|
| Cosine Similarity | General text retrieval | No (inherently normalized) | Yes (most vector stores) |
| Dot Product | Pre-normalized embeddings | Yes (L2 normalization) | Some backends |
| Euclidean (L2) | Spatial/geometric data | Optional | Rarely used for text |
Index Construction Considerations
Batch Size
Embedding API calls are typically batched for throughput. The batch size controls how many nodes are embedded in a single API call. Larger batches improve throughput but require more memory and may hit API rate limits.
Storage Context
LlamaIndex uses a StorageContext to abstract the underlying storage backend. The same index construction code works whether vectors are stored:
- In memory (default, for development)
- In a local persistent store (e.g., Chroma, FAISS on disk)
- In a managed cloud service (e.g., Pinecone, Weaviate, Qdrant)
Incremental Construction
Indices can be built incrementally:
- from_documents: Build a complete index from a list of documents (includes chunking and embedding).
- insert: Add new nodes to an existing index.
- from_vector_store: Wrap an existing, pre-populated vector store as an index.
This flexibility supports both batch ingestion and real-time updates.
Transformations
Before embedding, documents may pass through a transformation pipeline that includes:
- Node parsing: Splitting documents into chunks.
- Metadata extraction: Enriching nodes with additional metadata.
- Text cleaning: Removing noise, normalizing whitespace.
When using from_documents, these transformations are applied automatically using the globally configured or locally provided transformations.
Relationship to Other Principles
Vector Index Construction sits between Document Loading and Query Execution in the RAG pipeline:
- It receives processed documents/nodes from the loading and transformation stages.
- It produces a queryable index that the retrieval and synthesis stages consume.
- It depends on the Settings configuration for the embedding model and transformation pipeline.
Knowledge Sources
LlamaIndex Vector Store Index Guide LlamaIndex Indexing Documentation Efficient and Robust Approximate Nearest Neighbor Search
Implementation
Implementation:Run_llama_Llama_index_VectorStoreIndex_From_Documents