Implementation:Deepset ai Haystack InMemoryDocumentStore
| Knowledge Sources | |
|---|---|
| Domains | Information_Retrieval, Data_Storage |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Concrete tool for storing and retrieving documents in memory provided by the Haystack framework.
Description
The InMemoryDocumentStore class provides an ephemeral document store that supports both BM25 keyword retrieval and embedding-based vector retrieval. It maintains documents in a global dictionary indexed by a unique store index (UUID by default). BM25 statistics (token frequencies, document lengths, IDF) are computed incrementally as documents are written. Embedding retrieval uses dot product or cosine similarity against stored document vectors.
Usage
Import this class when you need a lightweight document store for prototyping, testing, or small-scale RAG pipelines. It requires no external database and is the default document store for Haystack's in-memory retrievers.
Code Reference
Source Location
- Repository: haystack
- File: haystack/document_stores/in_memory/document_store.py
- Lines: L58-123 (constructor)
Signature
class InMemoryDocumentStore:
def __init__(
self,
bm25_tokenization_regex: str = r"(?u)\b\w\w+\b",
bm25_algorithm: Literal["BM25Okapi", "BM25L", "BM25Plus"] = "BM25L",
bm25_parameters: dict | None = None,
embedding_similarity_function: Literal["dot_product", "cosine"] = "dot_product",
index: str | None = None,
async_executor: ThreadPoolExecutor | None = None,
return_embedding: bool = True,
):
"""
Args:
bm25_tokenization_regex: Regex for tokenizing text for BM25 retrieval.
bm25_algorithm: BM25 variant - "BM25Okapi", "BM25L", or "BM25Plus".
bm25_parameters: Parameters for BM25 (e.g., {'k1':1.5, 'b':0.75}).
embedding_similarity_function: "dot_product" or "cosine".
index: Unique index name. If None, a random UUID is generated.
async_executor: Optional ThreadPoolExecutor for async operations.
return_embedding: Whether to return embeddings with retrieved documents.
"""
Import
from haystack.document_stores.in_memory import InMemoryDocumentStore
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| bm25_tokenization_regex | str | No | Regex pattern for tokenizing text (default: word boundaries) |
| bm25_algorithm | Literal | No | BM25 variant to use (default: "BM25L") |
| bm25_parameters | dict or None | No | Custom BM25 parameters |
| embedding_similarity_function | Literal | No | Similarity function for embeddings (default: "dot_product") |
| index | str or None | No | Store index name (default: auto-generated UUID) |
| return_embedding | bool | No | Include embeddings in retrieval results (default: True) |
Outputs
| Name | Type | Description |
|---|---|---|
| instance | InMemoryDocumentStore | Configured document store ready for use with retrievers and writers |
| write_documents() returns | int | Count of documents written |
| bm25_retrieval() returns | list[Document] | Documents matching BM25 query |
| embedding_retrieval() returns | list[Document] | Documents matching embedding similarity |
Usage Examples
Basic Usage
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document
# Create store with default settings
document_store = InMemoryDocumentStore()
# Write documents
docs = [
Document(content="Paris is the capital of France."),
Document(content="Berlin is the capital of Germany."),
Document(content="Rome is the capital of Italy."),
]
document_store.write_documents(docs)
print(f"Documents stored: {document_store.count_documents()}")
With Custom BM25 and Cosine Similarity
from haystack.document_stores.in_memory import InMemoryDocumentStore
# Create store with cosine similarity for embeddings
document_store = InMemoryDocumentStore(
bm25_algorithm="BM25Okapi",
bm25_parameters={"k1": 1.5, "b": 0.75},
embedding_similarity_function="cosine",
)