Implementation:Run llama Llama index IngestionPipeline Init
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, RAG, Pipeline_Architecture |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
The IngestionPipeline constructor assembles a document processing pipeline from a list of transformations and optional infrastructure components (vector store, docstore, cache).
Description
IngestionPipeline is a BaseModel (Pydantic) class that validates and stores the pipeline configuration at construction time. The constructor accepts an ordered list of TransformComponent instances that define the processing chain, along with optional components for persistence, deduplication, and vector storage.
When a vector_store is provided, processed nodes are automatically inserted after all transformations complete. When a docstore is provided, the pipeline applies the selected docstore_strategy to detect and handle duplicate documents. The cache stores intermediate transformation results keyed by input content hash and transformation identity.
Usage
Create an IngestionPipeline instance with at minimum a transformations list. Add a vector_store for automatic storage, a docstore for deduplication, and set disable_cache=True if caching is not desired.
Code Reference
Source Location
- Repository: llama_index
- File: llama-index-core/llama_index/core/ingestion/pipeline.py
- Lines: L205-364
Signature
class IngestionPipeline(BaseModel):
def __init__(
self,
name: str = DEFAULT_PIPELINE_NAME,
project_name: str = DEFAULT_PROJECT_NAME,
transformations: Optional[List[TransformComponent]] = None,
readers: Optional[List[ReaderConfig]] = None,
documents: Optional[Sequence[Document]] = None,
vector_store: Optional[BasePydanticVectorStore] = None,
cache: Optional[IngestionCache] = None,
docstore: Optional[BaseDocumentStore] = None,
docstore_strategy: DocstoreStrategy = DocstoreStrategy.UPSERTS,
disable_cache: bool = False,
) -> None:
Import
from llama_index.core.ingestion import IngestionPipeline
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | str | No (default: DEFAULT_PIPELINE_NAME) | Name identifier for the pipeline |
| project_name | str | No (default: DEFAULT_PROJECT_NAME) | Project name for organizational grouping |
| transformations | Optional[List[TransformComponent]] | No | Ordered list of transformation steps to apply to documents |
| readers | Optional[List[ReaderConfig]] | No | Reader configurations for loading documents from sources |
| documents | Optional[Sequence[Document]] | No | Pre-loaded documents to process |
| vector_store | Optional[BasePydanticVectorStore] | No | Vector store for automatic node insertion after processing |
| cache | Optional[IngestionCache] | No | Cache for storing intermediate transformation results |
| docstore | Optional[BaseDocumentStore] | No | Document store for tracking ingested documents and deduplication |
| docstore_strategy | DocstoreStrategy | No (default: UPSERTS) | Strategy for handling duplicate documents when docstore is present |
| disable_cache | bool | No (default: False) | Set to True to disable transformation caching |
Outputs
| Name | Type | Description |
|---|---|---|
| return | IngestionPipeline | Configured pipeline instance ready for execution via run() or arun() |
Usage Examples
Minimal Pipeline
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=1024, chunk_overlap=200),
],
)
Full Pipeline with Vector Store and Deduplication
from llama_index.core.ingestion import IngestionPipeline, DocstoreStrategy
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.extractors import TitleExtractor
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
pipeline = IngestionPipeline(
name="my_rag_pipeline",
transformations=[
SentenceSplitter(chunk_size=512, chunk_overlap=50),
TitleExtractor(),
OpenAIEmbedding(),
],
vector_store=ChromaVectorStore(chroma_collection=collection),
docstore=SimpleDocumentStore(),
docstore_strategy=DocstoreStrategy.UPSERTS,
)