Implementation:Hpcaitech ColossalAI CustomStuffDocumentsChain
| Knowledge Sources | |
|---|---|
| Domains | NLP, Question Answering, Document Processing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
CustomStuffDocumentsChain is a custom extension of LangChain's StuffDocumentsChain that combines documents by stuffing them into a single context string with numbered prefixes and support for key-value metadata replacement.
Description
This class overrides the _get_inputs method of LangChain's StuffDocumentsChain to add custom document formatting logic. Each document is prefixed with a numbered label (e.g., "Supporting Document0:", "Supporting Document1:") and supports key-value metadata mapping where the document's page content can be replaced by its metadata value when the is_key_value_mapping flag is set. The formatted documents are then joined together and passed to the LLM chain for processing.
Usage
Use CustomStuffDocumentsChain when building a ColossalQA retrieval pipeline that requires custom document formatting, including numbered document prefixes for clearer context presentation and support for key-value document transformations before they are fed into the LLM.
Code Reference
Source Location
- Repository: Hpcaitech_ColossalAI
- File: applications/ColossalQA/colossalqa/chain/retrieval_qa/stuff.py
- Lines: 1-92
Signature
class CustomStuffDocumentsChain(StuffDocumentsChain):
def _get_inputs(self, docs: List[Document], **kwargs: Any) -> dict:
...
Import
from colossalqa.chain.retrieval_qa.stuff import CustomStuffDocumentsChain
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| docs | List[Document] | Yes | List of LangChain Document objects to format and combine into a single input string |
| doc_prefix | str | No | Prefix label for each document in the combined context (default: "Supporting Document") |
| stop | list | No | Stop token list passed through kwargs to the LLM chain |
| temperature | float | No | Temperature parameter passed through kwargs to the LLM chain |
| top_k | int | No | Top-k sampling parameter passed through kwargs to the LLM chain |
| top_p | float | No | Top-p sampling parameter passed through kwargs to the LLM chain |
| max_new_tokens | int | No | Maximum new tokens parameter passed through kwargs to the LLM chain |
Outputs
| Name | Type | Description |
|---|---|---|
| return | dict | A dictionary of inputs for the LLM chain, including the combined document string under the document_variable_name key and any generation parameters |
Usage Examples
from colossalqa.chain.retrieval_qa.stuff import CustomStuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain.docstore.document import Document
# Create an LLM chain
prompt = PromptTemplate.from_template("Answer based on context: {context}\nQuestion: {question}")
llm_chain = LLMChain(llm=my_llm, prompt=prompt)
# Create the custom stuff chain
chain = CustomStuffDocumentsChain(
llm_chain=llm_chain,
document_variable_name="context",
)
# Run with documents
docs = [
Document(page_content="ColossalAI is a distributed training framework."),
Document(page_content="It supports various parallelism strategies."),
]
result = chain.run(input_documents=docs, question="What is ColossalAI?")