Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index IngestionPipeline Init

From Leeroopedia
Knowledge Sources
Domains Data_Ingestion, RAG, Pipeline_Architecture
Last Updated 2026-02-11 00:00 GMT

Overview

The IngestionPipeline constructor assembles a document processing pipeline from a list of transformations and optional infrastructure components (vector store, docstore, cache).

Description

IngestionPipeline is a BaseModel (Pydantic) class that validates and stores the pipeline configuration at construction time. The constructor accepts an ordered list of TransformComponent instances that define the processing chain, along with optional components for persistence, deduplication, and vector storage.

When a vector_store is provided, processed nodes are automatically inserted after all transformations complete. When a docstore is provided, the pipeline applies the selected docstore_strategy to detect and handle duplicate documents. The cache stores intermediate transformation results keyed by input content hash and transformation identity.

Usage

Create an IngestionPipeline instance with at minimum a transformations list. Add a vector_store for automatic storage, a docstore for deduplication, and set disable_cache=True if caching is not desired.

Code Reference

Source Location

  • Repository: llama_index
  • File: llama-index-core/llama_index/core/ingestion/pipeline.py
  • Lines: L205-364

Signature

class IngestionPipeline(BaseModel):
    def __init__(
        self,
        name: str = DEFAULT_PIPELINE_NAME,
        project_name: str = DEFAULT_PROJECT_NAME,
        transformations: Optional[List[TransformComponent]] = None,
        readers: Optional[List[ReaderConfig]] = None,
        documents: Optional[Sequence[Document]] = None,
        vector_store: Optional[BasePydanticVectorStore] = None,
        cache: Optional[IngestionCache] = None,
        docstore: Optional[BaseDocumentStore] = None,
        docstore_strategy: DocstoreStrategy = DocstoreStrategy.UPSERTS,
        disable_cache: bool = False,
    ) -> None:

Import

from llama_index.core.ingestion import IngestionPipeline

I/O Contract

Inputs

Name Type Required Description
name str No (default: DEFAULT_PIPELINE_NAME) Name identifier for the pipeline
project_name str No (default: DEFAULT_PROJECT_NAME) Project name for organizational grouping
transformations Optional[List[TransformComponent]] No Ordered list of transformation steps to apply to documents
readers Optional[List[ReaderConfig]] No Reader configurations for loading documents from sources
documents Optional[Sequence[Document]] No Pre-loaded documents to process
vector_store Optional[BasePydanticVectorStore] No Vector store for automatic node insertion after processing
cache Optional[IngestionCache] No Cache for storing intermediate transformation results
docstore Optional[BaseDocumentStore] No Document store for tracking ingested documents and deduplication
docstore_strategy DocstoreStrategy No (default: UPSERTS) Strategy for handling duplicate documents when docstore is present
disable_cache bool No (default: False) Set to True to disable transformation caching

Outputs

Name Type Description
return IngestionPipeline Configured pipeline instance ready for execution via run() or arun()

Usage Examples

Minimal Pipeline

from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=1024, chunk_overlap=200),
    ],
)

Full Pipeline with Vector Store and Deduplication

from llama_index.core.ingestion import IngestionPipeline, DocstoreStrategy
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.extractors import TitleExtractor
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore

pipeline = IngestionPipeline(
    name="my_rag_pipeline",
    transformations=[
        SentenceSplitter(chunk_size=512, chunk_overlap=50),
        TitleExtractor(),
        OpenAIEmbedding(),
    ],
    vector_store=ChromaVectorStore(chroma_collection=collection),
    docstore=SimpleDocumentStore(),
    docstore_strategy=DocstoreStrategy.UPSERTS,
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment