Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hpcaitech ColossalAI CustomStuffDocumentsChain

From Leeroopedia
Revision as of 15:08, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Hpcaitech_ColossalAI_CustomStuffDocumentsChain.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains NLP, Question Answering, Document Processing
Last Updated 2026-02-09 00:00 GMT

Overview

CustomStuffDocumentsChain is a custom extension of LangChain's StuffDocumentsChain that combines documents by stuffing them into a single context string with numbered prefixes and support for key-value metadata replacement.

Description

This class overrides the _get_inputs method of LangChain's StuffDocumentsChain to add custom document formatting logic. Each document is prefixed with a numbered label (e.g., "Supporting Document0:", "Supporting Document1:") and supports key-value metadata mapping where the document's page content can be replaced by its metadata value when the is_key_value_mapping flag is set. The formatted documents are then joined together and passed to the LLM chain for processing.

Usage

Use CustomStuffDocumentsChain when building a ColossalQA retrieval pipeline that requires custom document formatting, including numbered document prefixes for clearer context presentation and support for key-value document transformations before they are fed into the LLM.

Code Reference

Source Location

Signature

class CustomStuffDocumentsChain(StuffDocumentsChain):
    def _get_inputs(self, docs: List[Document], **kwargs: Any) -> dict:
        ...

Import

from colossalqa.chain.retrieval_qa.stuff import CustomStuffDocumentsChain

I/O Contract

Inputs

Name Type Required Description
docs List[Document] Yes List of LangChain Document objects to format and combine into a single input string
doc_prefix str No Prefix label for each document in the combined context (default: "Supporting Document")
stop list No Stop token list passed through kwargs to the LLM chain
temperature float No Temperature parameter passed through kwargs to the LLM chain
top_k int No Top-k sampling parameter passed through kwargs to the LLM chain
top_p float No Top-p sampling parameter passed through kwargs to the LLM chain
max_new_tokens int No Maximum new tokens parameter passed through kwargs to the LLM chain

Outputs

Name Type Description
return dict A dictionary of inputs for the LLM chain, including the combined document string under the document_variable_name key and any generation parameters

Usage Examples

from colossalqa.chain.retrieval_qa.stuff import CustomStuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain.docstore.document import Document

# Create an LLM chain
prompt = PromptTemplate.from_template("Answer based on context: {context}\nQuestion: {question}")
llm_chain = LLMChain(llm=my_llm, prompt=prompt)

# Create the custom stuff chain
chain = CustomStuffDocumentsChain(
    llm_chain=llm_chain,
    document_variable_name="context",
)

# Run with documents
docs = [
    Document(page_content="ColossalAI is a distributed training framework."),
    Document(page_content="It supports various parallelism strategies."),
]
result = chain.run(input_documents=docs, question="What is ColossalAI?")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment