Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FlagOpen FlagEmbedding BGE Coder LLM Client

From Leeroopedia


Knowledge Sources
Domains Large Language Models, API Client, Data Generation
Last Updated 2026-02-09 00:00 GMT

Overview

A unified LLM client class supporting OpenAI, Azure OpenAI, and open-source models for generating training data.

Description

The LLM class provides a unified interface for interacting with various language model APIs, including OpenAI, Azure OpenAI, and locally hosted open-source models via vLLM. It handles API connection management, retry logic with exponential backoff, timeout handling with threading, token-based text splitting using tiktoken, and removal of thinking tags from model outputs. The client includes robust error handling for rate limits, connection errors, and content filtering, with configurable temperature, top-p, and repetition penalty parameters.

Usage

Use this class when generating synthetic training data for BGE-Coder, such as creating query-document pairs or generating code completions. It is suitable for batch processing with automatic retry logic, interacting with multiple LLM providers through a single interface, and handling long-running generation tasks with timeout protection.

Code Reference

Source Location

Signature

class LLM:
    def __init__(
        self,
        model: str="Qwen2-5-Coder-32B-Instruct",
        model_type: str = "open-source",
        port: int = 8000,
    ):
        pass

    def chat(
        self,
        prompt: str,
        max_tokens: int = 8192,
        logit_bais: dict = None,
        n: int = 1,
        temperature: float = 1.0,
        top_p: float = 0.6,
        repetition_penalty: float = 1.0,
        remove_thinking: bool = True,
        timeout: int = 90,
    ):
        """Send chat completion request and return responses"""

    def split_text(self, text: str, anchor_points: Tuple[float, float] = (0.4, 0.7)):
        """Split text at a random anchor point based on tokens"""

Import

from llm import LLM

I/O Contract

Inputs

Name Type Required Description
model str No Model name or identifier (default: "Qwen2-5-Coder-32B-Instruct")
model_type str No One of "open-source", "azure", or "openai"
port int No Port for open-source model endpoint (default: 8000)
prompt str Yes Input prompt for generation
max_tokens int No Maximum tokens to generate (default: 8192)
temperature float No Sampling temperature (default: 1.0)
n int No Number of completions to generate (default: 1)

Outputs

Name Type Description
responses List[str] List of generated text completions (None if failed)

Usage Examples

# Example 1: Using OpenAI API
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"

llm = LLM(
    model="gpt-4o-mini-2024-07-18",
    model_type="openai"
)

prompt = "Write a Python function to compute Fibonacci numbers."
responses = llm.chat(prompt, n=2, temperature=0.7)
for i, response in enumerate(responses):
    print(f"Response {i+1}: {response}")

# Example 2: Using local open-source model
llm_local = LLM(
    model="Qwen2-5-Coder-32B-Instruct",
    model_type="open-source",
    port=8000
)

response = llm_local.chat("Explain binary search algorithm.")[0]
print(response)

# Example 3: Text splitting for data generation
text = "This is a long piece of code..." * 100
first_half, second_half = llm.split_text(text, anchor_points=(0.3, 0.7))
print(f"First part length: {len(first_half)}")
print(f"Second part length: {len(second_half)}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment