Implementation:InternLM Lmdeploy APIClient

Knowledge Sources	LMDeploy API Server
Domains	LLM_Serving, Client_SDK
Last Updated	2026-02-07 15:00 GMT

Overview

Concrete tool for connecting to LMDeploy API servers using a Python client provided by the LMDeploy library.

Description

The APIClient class provides a Python interface for consuming LMDeploy's OpenAI-compatible HTTP API. It wraps the requests library and supports both streaming and non-streaming chat completions and text completions. As an alternative, users can use the standard OpenAI Python SDK by pointing base_url to the LMDeploy server.

Usage

Import this class when building Python applications that need to communicate with a running LMDeploy API server. Use the OpenAI SDK alternative for drop-in compatibility with existing OpenAI-based code.

Code Reference

Source Location

Repository: lmdeploy
File: lmdeploy/serve/openai/api_client.py
Lines: L38-58 (init), L90-173 (chat_completions_v1), L175-196 (completions_v1)

Signature

class APIClient:
    def __init__(self, api_server_url: str, api_key: Optional[str] = None):
        ...

    def chat_completions_v1(self,
                            model: str,
                            messages: Union[str, List[Dict]],
                            temperature: float = 0.7,
                            top_p: float = 1.0,
                            n: int = 1,
                            max_tokens: int = None,
                            stop: Optional[Union[str, List[str]]] = None,
                            stream: bool = False,
                            **kwargs) -> Iterator[dict]:
        ...

    def completions_v1(self,
                       model: str,
                       prompt: str,
                       **kwargs) -> Iterator[dict]:
        ...

Import

from lmdeploy.serve.openai.api_client import APIClient

I/O Contract

Inputs

Name	Type	Required	Description
api_server_url	str	Yes	URL of running LMDeploy server (e.g., 'http://localhost:23333')
api_key	str	No	Authentication key
model	str	Yes	Model name (from /v1/models)
messages	str or List[Dict]	Yes	OpenAI-format messages
temperature	float	No	Sampling temperature (default: 0.7)
stream	bool	No	Enable streaming (default: False)

Outputs

Name	Type	Description
response	Iterator[dict]	OpenAI-format JSON response dicts with choices[0].message.content

Usage Examples

Native APIClient

from lmdeploy.serve.openai.api_client import APIClient

client = APIClient('http://localhost:23333')

# Non-streaming
for response in client.chat_completions_v1(
    model='internlm2_5-7b-chat',
    messages=[{"role": "user", "content": "Hello!"}],
    stream=False
):
    print(response['choices'][0]['message']['content'])

OpenAI SDK Alternative

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:23333/v1',
    api_key='your-key'
)

response = client.chat.completions.create(
    model='internlm2_5-7b-chat',
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Related Pages

Implements Principle

Principle:InternLM_Lmdeploy_API_Client_Integration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment