Implementation:Hpcaitech ColossalAI CustomStuffDocumentsChain

Knowledge Sources	Hpcaitech_ColossalAI
Domains	NLP, Question Answering, Document Processing
Last Updated	2026-02-09 00:00 GMT

Overview

CustomStuffDocumentsChain is a custom extension of LangChain's StuffDocumentsChain that combines documents by stuffing them into a single context string with numbered prefixes and support for key-value metadata replacement.

Description

This class overrides the _get_inputs method of LangChain's StuffDocumentsChain to add custom document formatting logic. Each document is prefixed with a numbered label (e.g., "Supporting Document0:", "Supporting Document1:") and supports key-value metadata mapping where the document's page content can be replaced by its metadata value when the is_key_value_mapping flag is set. The formatted documents are then joined together and passed to the LLM chain for processing.

Usage

Use CustomStuffDocumentsChain when building a ColossalQA retrieval pipeline that requires custom document formatting, including numbered document prefixes for clearer context presentation and support for key-value document transformations before they are fed into the LLM.

Code Reference

Source Location

Repository: Hpcaitech_ColossalAI
File: applications/ColossalQA/colossalqa/chain/retrieval_qa/stuff.py
Lines: 1-92

Signature

class CustomStuffDocumentsChain(StuffDocumentsChain):
    def _get_inputs(self, docs: List[Document], **kwargs: Any) -> dict:
        ...

Import

from colossalqa.chain.retrieval_qa.stuff import CustomStuffDocumentsChain

I/O Contract

Inputs

Name	Type	Required	Description
docs	List[Document]	Yes	List of LangChain Document objects to format and combine into a single input string
doc_prefix	str	No	Prefix label for each document in the combined context (default: "Supporting Document")
stop	list	No	Stop token list passed through kwargs to the LLM chain
temperature	float	No	Temperature parameter passed through kwargs to the LLM chain
top_k	int	No	Top-k sampling parameter passed through kwargs to the LLM chain
top_p	float	No	Top-p sampling parameter passed through kwargs to the LLM chain
max_new_tokens	int	No	Maximum new tokens parameter passed through kwargs to the LLM chain

Outputs

Name	Type	Description
return	dict	A dictionary of inputs for the LLM chain, including the combined document string under the document_variable_name key and any generation parameters

Usage Examples

from colossalqa.chain.retrieval_qa.stuff import CustomStuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain.docstore.document import Document

# Create an LLM chain
prompt = PromptTemplate.from_template("Answer based on context: {context}\nQuestion: {question}")
llm_chain = LLMChain(llm=my_llm, prompt=prompt)

# Create the custom stuff chain
chain = CustomStuffDocumentsChain(
    llm_chain=llm_chain,
    document_variable_name="context",
)

# Run with documents
docs = [
    Document(page_content="ColossalAI is a distributed training framework."),
    Document(page_content="It supports various parallelism strategies."),
]
result = chain.run(input_documents=docs, question="What is ColossalAI?")

Related Pages

Environment:Hpcaitech_ColossalAI_ColossalQA_RAG_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment