Implementation:FlagOpen FlagEmbedding BGE Coder LLM Client

Knowledge Sources	FlagOpen_FlagEmbedding
Domains	Large Language Models, API Client, Data Generation
Last Updated	2026-02-09 00:00 GMT

Overview

A unified LLM client class supporting OpenAI, Azure OpenAI, and open-source models for generating training data.

Description

The LLM class provides a unified interface for interacting with various language model APIs, including OpenAI, Azure OpenAI, and locally hosted open-source models via vLLM. It handles API connection management, retry logic with exponential backoff, timeout handling with threading, token-based text splitting using tiktoken, and removal of thinking tags from model outputs. The client includes robust error handling for rate limits, connection errors, and content filtering, with configurable temperature, top-p, and repetition penalty parameters.

Usage

Use this class when generating synthetic training data for BGE-Coder, such as creating query-document pairs or generating code completions. It is suitable for batch processing with automatic retry logic, interacting with multiple LLM providers through a single interface, and handling long-running generation tasks with timeout protection.

Code Reference

Source Location

Repository: FlagOpen_FlagEmbedding
File: research/BGE_Coder/data_generation/llm.py
Lines: 1-134

Signature

class LLM:
    def __init__(
        self,
        model: str="Qwen2-5-Coder-32B-Instruct",
        model_type: str = "open-source",
        port: int = 8000,
    ):
        pass

    def chat(
        self,
        prompt: str,
        max_tokens: int = 8192,
        logit_bais: dict = None,
        n: int = 1,
        temperature: float = 1.0,
        top_p: float = 0.6,
        repetition_penalty: float = 1.0,
        remove_thinking: bool = True,
        timeout: int = 90,
    ):
        """Send chat completion request and return responses"""

    def split_text(self, text: str, anchor_points: Tuple[float, float] = (0.4, 0.7)):
        """Split text at a random anchor point based on tokens"""

Import

from llm import LLM

I/O Contract

Inputs

Name	Type	Required	Description
model	str	No	Model name or identifier (default: "Qwen2-5-Coder-32B-Instruct")
model_type	str	No	One of "open-source", "azure", or "openai"
port	int	No	Port for open-source model endpoint (default: 8000)
prompt	str	Yes	Input prompt for generation
max_tokens	int	No	Maximum tokens to generate (default: 8192)
temperature	float	No	Sampling temperature (default: 1.0)
n	int	No	Number of completions to generate (default: 1)

Outputs

Name	Type	Description
responses	List[str]	List of generated text completions (None if failed)

Usage Examples

# Example 1: Using OpenAI API
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"

llm = LLM(
    model="gpt-4o-mini-2024-07-18",
    model_type="openai"
)

prompt = "Write a Python function to compute Fibonacci numbers."
responses = llm.chat(prompt, n=2, temperature=0.7)
for i, response in enumerate(responses):
    print(f"Response {i+1}: {response}")

# Example 2: Using local open-source model
llm_local = LLM(
    model="Qwen2-5-Coder-32B-Instruct",
    model_type="open-source",
    port=8000
)

response = llm_local.chat("Explain binary search algorithm.")[0]
print(response)

# Example 3: Text splitting for data generation
text = "This is a long piece of code..." * 100
first_half, second_half = llm.split_text(text, anchor_points=(0.3, 0.7))
print(f"First part length: {len(first_half)}")
print(f"Second part length: {len(second_half)}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment