Implementation:FlagOpen FlagEmbedding BGE Coder LLM Client
| Knowledge Sources | |
|---|---|
| Domains | Large Language Models, API Client, Data Generation |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A unified LLM client class supporting OpenAI, Azure OpenAI, and open-source models for generating training data.
Description
The LLM class provides a unified interface for interacting with various language model APIs, including OpenAI, Azure OpenAI, and locally hosted open-source models via vLLM. It handles API connection management, retry logic with exponential backoff, timeout handling with threading, token-based text splitting using tiktoken, and removal of thinking tags from model outputs. The client includes robust error handling for rate limits, connection errors, and content filtering, with configurable temperature, top-p, and repetition penalty parameters.
Usage
Use this class when generating synthetic training data for BGE-Coder, such as creating query-document pairs or generating code completions. It is suitable for batch processing with automatic retry logic, interacting with multiple LLM providers through a single interface, and handling long-running generation tasks with timeout protection.
Code Reference
Source Location
- Repository: FlagOpen_FlagEmbedding
- File: research/BGE_Coder/data_generation/llm.py
- Lines: 1-134
Signature
class LLM:
def __init__(
self,
model: str="Qwen2-5-Coder-32B-Instruct",
model_type: str = "open-source",
port: int = 8000,
):
pass
def chat(
self,
prompt: str,
max_tokens: int = 8192,
logit_bais: dict = None,
n: int = 1,
temperature: float = 1.0,
top_p: float = 0.6,
repetition_penalty: float = 1.0,
remove_thinking: bool = True,
timeout: int = 90,
):
"""Send chat completion request and return responses"""
def split_text(self, text: str, anchor_points: Tuple[float, float] = (0.4, 0.7)):
"""Split text at a random anchor point based on tokens"""
Import
from llm import LLM
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | str | No | Model name or identifier (default: "Qwen2-5-Coder-32B-Instruct") |
| model_type | str | No | One of "open-source", "azure", or "openai" |
| port | int | No | Port for open-source model endpoint (default: 8000) |
| prompt | str | Yes | Input prompt for generation |
| max_tokens | int | No | Maximum tokens to generate (default: 8192) |
| temperature | float | No | Sampling temperature (default: 1.0) |
| n | int | No | Number of completions to generate (default: 1) |
Outputs
| Name | Type | Description |
|---|---|---|
| responses | List[str] | List of generated text completions (None if failed) |
Usage Examples
# Example 1: Using OpenAI API
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
llm = LLM(
model="gpt-4o-mini-2024-07-18",
model_type="openai"
)
prompt = "Write a Python function to compute Fibonacci numbers."
responses = llm.chat(prompt, n=2, temperature=0.7)
for i, response in enumerate(responses):
print(f"Response {i+1}: {response}")
# Example 2: Using local open-source model
llm_local = LLM(
model="Qwen2-5-Coder-32B-Instruct",
model_type="open-source",
port=8000
)
response = llm_local.chat("Explain binary search algorithm.")[0]
print(response)
# Example 3: Text splitting for data generation
text = "This is a long piece of code..." * 100
first_half, second_half = llm.split_text(text, anchor_points=(0.3, 0.7))
print(f"First part length: {len(first_half)}")
print(f"Second part length: {len(second_half)}")