Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepset ai Haystack InMemoryDocumentStore

From Leeroopedia
Knowledge Sources
Domains Information_Retrieval, Data_Storage
Last Updated 2026-02-11 00:00 GMT

Overview

Concrete tool for storing and retrieving documents in memory provided by the Haystack framework.

Description

The InMemoryDocumentStore class provides an ephemeral document store that supports both BM25 keyword retrieval and embedding-based vector retrieval. It maintains documents in a global dictionary indexed by a unique store index (UUID by default). BM25 statistics (token frequencies, document lengths, IDF) are computed incrementally as documents are written. Embedding retrieval uses dot product or cosine similarity against stored document vectors.

Usage

Import this class when you need a lightweight document store for prototyping, testing, or small-scale RAG pipelines. It requires no external database and is the default document store for Haystack's in-memory retrievers.

Code Reference

Source Location

  • Repository: haystack
  • File: haystack/document_stores/in_memory/document_store.py
  • Lines: L58-123 (constructor)

Signature

class InMemoryDocumentStore:
    def __init__(
        self,
        bm25_tokenization_regex: str = r"(?u)\b\w\w+\b",
        bm25_algorithm: Literal["BM25Okapi", "BM25L", "BM25Plus"] = "BM25L",
        bm25_parameters: dict | None = None,
        embedding_similarity_function: Literal["dot_product", "cosine"] = "dot_product",
        index: str | None = None,
        async_executor: ThreadPoolExecutor | None = None,
        return_embedding: bool = True,
    ):
        """
        Args:
            bm25_tokenization_regex: Regex for tokenizing text for BM25 retrieval.
            bm25_algorithm: BM25 variant - "BM25Okapi", "BM25L", or "BM25Plus".
            bm25_parameters: Parameters for BM25 (e.g., {'k1':1.5, 'b':0.75}).
            embedding_similarity_function: "dot_product" or "cosine".
            index: Unique index name. If None, a random UUID is generated.
            async_executor: Optional ThreadPoolExecutor for async operations.
            return_embedding: Whether to return embeddings with retrieved documents.
        """

Import

from haystack.document_stores.in_memory import InMemoryDocumentStore

I/O Contract

Inputs

Name Type Required Description
bm25_tokenization_regex str No Regex pattern for tokenizing text (default: word boundaries)
bm25_algorithm Literal No BM25 variant to use (default: "BM25L")
bm25_parameters dict or None No Custom BM25 parameters
embedding_similarity_function Literal No Similarity function for embeddings (default: "dot_product")
index str or None No Store index name (default: auto-generated UUID)
return_embedding bool No Include embeddings in retrieval results (default: True)

Outputs

Name Type Description
instance InMemoryDocumentStore Configured document store ready for use with retrievers and writers
write_documents() returns int Count of documents written
bm25_retrieval() returns list[Document] Documents matching BM25 query
embedding_retrieval() returns list[Document] Documents matching embedding similarity

Usage Examples

Basic Usage

from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document

# Create store with default settings
document_store = InMemoryDocumentStore()

# Write documents
docs = [
    Document(content="Paris is the capital of France."),
    Document(content="Berlin is the capital of Germany."),
    Document(content="Rome is the capital of Italy."),
]
document_store.write_documents(docs)

print(f"Documents stored: {document_store.count_documents()}")

With Custom BM25 and Cosine Similarity

from haystack.document_stores.in_memory import InMemoryDocumentStore

# Create store with cosine similarity for embeddings
document_store = InMemoryDocumentStore(
    bm25_algorithm="BM25Okapi",
    bm25_parameters={"k1": 1.5, "b": 0.75},
    embedding_similarity_function="cosine",
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment