Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Kserve Kserve OpenAI Compatible Inference

From Leeroopedia
Knowledge Sources
Domains LLM_Serving, API_Design, Inference
Last Updated 2026-02-13 00:00 GMT

Overview

An API compatibility pattern that serves LLM inference through OpenAI-compatible REST endpoints, enabling drop-in replacement of OpenAI API calls.

Description

OpenAI-Compatible Inference provides standard REST endpoints that match the OpenAI API specification:

  • /v1/completions — Text completion
  • /v1/chat/completions — Chat-based completion with message history

vLLM implements these endpoints natively, and KServe routes requests to vLLM pods through Envoy Gateway HTTPRoutes. This enables applications built for the OpenAI API to use self-hosted models without code changes.

Usage

Send standard OpenAI API requests to the LLMInferenceService endpoint. The model name in the request body should match the model name in the LLMInferenceService spec.

Theoretical Basis

# OpenAI API compatibility (NOT implementation code)
Endpoints:
  POST /v1/completions
    Input:  {"model": "<name>", "prompt": "...", "max_tokens": N}
    Output: {"choices": [{"text": "...", "finish_reason": "stop"}], "usage": {...}}

  POST /v1/chat/completions
    Input:  {"model": "<name>", "messages": [{"role": "user", "content": "..."}]}
    Output: {"choices": [{"message": {"role": "assistant", "content": "..."}}]}

Routing path:
  Client → Envoy Gateway → HTTPRoute → InferencePool → vLLM pod
  URL rewrite: /<namespace>/<name>/v1/completions → /v1/completions

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment