Implementation:Hpcaitech ColossalAI Chat Conversation
| Knowledge Sources | |
|---|---|
| Domains | Natural Language Processing, Chat Template, RLHF |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Conversation template management for ColossalChat that handles chat formatting and message history.
Description
This module provides the Conversation dataclass and the setup_conversation_template factory function used throughout the ColossalChat RLHF training pipeline. The Conversation class wraps a tokenizer and its chat template configuration, managing system messages, message history, and prompt generation via the HuggingFace apply_chat_template method. The setup_conversation_template function handles loading chat templates from configuration dictionaries, tokenizer defaults, or external model paths, with optional saving of the resolved configuration to disk.
Usage
Use this module when setting up conversation templates for ColossalChat training or inference pipelines. It is essential for ensuring consistent prompt formatting across SFT, reward model training, and PPO/RLHF stages.
Code Reference
Source Location
- Repository: Hpcaitech_ColossalAI
- File: applications/ColossalChat/coati/dataset/conversation.py
- Lines: 1-149
Signature
@dataclasses.dataclass
class Conversation:
tokenizer: PreTrainedTokenizer
system_message: str
chat_template: str
stop_ids: List[int]
end_of_assistant: str
roles = ["user", "assistant"]
@classmethod
def from_config(cls, tokenizer: PreTrainedTokenizer, config: Dict):
def clear(self):
def get_prompt(self, length: int = None, add_generation_prompt=False) -> Any:
def append_message(self, role: str, message: str):
def copy(self):
def setup_conversation_template(
tokenizer: PreTrainedTokenizer, chat_template_config: Dict = None, save_path: str = None
) -> Conversation:
Import
from coati.dataset.conversation import Conversation, setup_conversation_template
I/O Contract
Inputs (setup_conversation_template)
| Name | Type | Required | Description |
|---|---|---|---|
| tokenizer | PreTrainedTokenizer | Yes | The tokenizer to use for chat template application |
| chat_template_config | Dict | No | Configuration dict with keys: system_message, chat_template, stop_ids, end_of_assistant |
| save_path | str | No | Optional path to save the resolved conversation template config |
Outputs
| Name | Type | Description |
|---|---|---|
| return | Conversation | A configured Conversation instance ready for prompt generation |
Usage Examples
from coati.dataset.conversation import Conversation, setup_conversation_template
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
# Setup from config
config = {
"system_message": "You are a helpful assistant.",
"chat_template": tokenizer.chat_template,
"stop_ids": [2],
"end_of_assistant": "</s>",
}
conv = setup_conversation_template(tokenizer, chat_template_config=config)
# Build a conversation
conv.append_message("user", "What is RLHF?")
conv.append_message("assistant", "RLHF stands for Reinforcement Learning from Human Feedback.")
prompt = conv.get_prompt(add_generation_prompt=True)