Implementation:Run llama Llama index IngestionPipeline Init

Knowledge Sources	LlamaIndex
Domains	Data_Ingestion, RAG, Pipeline_Architecture
Last Updated	2026-02-11 00:00 GMT

Overview

The IngestionPipeline constructor assembles a document processing pipeline from a list of transformations and optional infrastructure components (vector store, docstore, cache).

Description

IngestionPipeline is a BaseModel (Pydantic) class that validates and stores the pipeline configuration at construction time. The constructor accepts an ordered list of TransformComponent instances that define the processing chain, along with optional components for persistence, deduplication, and vector storage.

When a vector_store is provided, processed nodes are automatically inserted after all transformations complete. When a docstore is provided, the pipeline applies the selected docstore_strategy to detect and handle duplicate documents. The cache stores intermediate transformation results keyed by input content hash and transformation identity.

Usage

Create an IngestionPipeline instance with at minimum a transformations list. Add a vector_store for automatic storage, a docstore for deduplication, and set disable_cache=True if caching is not desired.

Code Reference

Source Location

Repository: llama_index
File: llama-index-core/llama_index/core/ingestion/pipeline.py
Lines: L205-364

Signature

class IngestionPipeline(BaseModel):
    def __init__(
        self,
        name: str = DEFAULT_PIPELINE_NAME,
        project_name: str = DEFAULT_PROJECT_NAME,
        transformations: Optional[List[TransformComponent]] = None,
        readers: Optional[List[ReaderConfig]] = None,
        documents: Optional[Sequence[Document]] = None,
        vector_store: Optional[BasePydanticVectorStore] = None,
        cache: Optional[IngestionCache] = None,
        docstore: Optional[BaseDocumentStore] = None,
        docstore_strategy: DocstoreStrategy = DocstoreStrategy.UPSERTS,
        disable_cache: bool = False,
    ) -> None:

Import

from llama_index.core.ingestion import IngestionPipeline

I/O Contract

Inputs

Name	Type	Required	Description
name	str	No (default: DEFAULT_PIPELINE_NAME)	Name identifier for the pipeline
project_name	str	No (default: DEFAULT_PROJECT_NAME)	Project name for organizational grouping
transformations	Optional[List[TransformComponent]]	No	Ordered list of transformation steps to apply to documents
readers	Optional[List[ReaderConfig]]	No	Reader configurations for loading documents from sources
documents	Optional[Sequence[Document]]	No	Pre-loaded documents to process
vector_store	Optional[BasePydanticVectorStore]	No	Vector store for automatic node insertion after processing
cache	Optional[IngestionCache]	No	Cache for storing intermediate transformation results
docstore	Optional[BaseDocumentStore]	No	Document store for tracking ingested documents and deduplication
docstore_strategy	DocstoreStrategy	No (default: UPSERTS)	Strategy for handling duplicate documents when docstore is present
disable_cache	bool	No (default: False)	Set to True to disable transformation caching

Outputs

Name	Type	Description
return	IngestionPipeline	Configured pipeline instance ready for execution via run() or arun()

Usage Examples

Minimal Pipeline

from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=1024, chunk_overlap=200),
    ],
)

Full Pipeline with Vector Store and Deduplication

from llama_index.core.ingestion import IngestionPipeline, DocstoreStrategy
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.extractors import TitleExtractor
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore

pipeline = IngestionPipeline(
    name="my_rag_pipeline",
    transformations=[
        SentenceSplitter(chunk_size=512, chunk_overlap=50),
        TitleExtractor(),
        OpenAIEmbedding(),
    ],
    vector_store=ChromaVectorStore(chroma_collection=collection),
    docstore=SimpleDocumentStore(),
    docstore_strategy=DocstoreStrategy.UPSERTS,
)

Related Pages

Implements Principle

Principle:Run_llama_Llama_index_Ingestion_Pipeline_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment