Principle:Kserve Kserve OpenAI Compatible Inference

Knowledge Sources	OpenAI API Reference vLLM OpenAI Server
Domains	LLM_Serving, API_Design, Inference
Last Updated	2026-02-13 00:00 GMT

Overview

An API compatibility pattern that serves LLM inference through OpenAI-compatible REST endpoints, enabling drop-in replacement of OpenAI API calls.

Description

OpenAI-Compatible Inference provides standard REST endpoints that match the OpenAI API specification:

/v1/completions — Text completion
/v1/chat/completions — Chat-based completion with message history

vLLM implements these endpoints natively, and KServe routes requests to vLLM pods through Envoy Gateway HTTPRoutes. This enables applications built for the OpenAI API to use self-hosted models without code changes.

Usage

Send standard OpenAI API requests to the LLMInferenceService endpoint. The model name in the request body should match the model name in the LLMInferenceService spec.

Theoretical Basis

# OpenAI API compatibility (NOT implementation code)
Endpoints:
  POST /v1/completions
    Input:  {"model": "<name>", "prompt": "...", "max_tokens": N}
    Output: {"choices": [{"text": "...", "finish_reason": "stop"}], "usage": {...}}

  POST /v1/chat/completions
    Input:  {"model": "<name>", "messages": [{"role": "user", "content": "..."}]}
    Output: {"choices": [{"message": {"role": "assistant", "content": "..."}}]}

Routing path:
  Client → Envoy Gateway → HTTPRoute → InferencePool → vLLM pod
  URL rewrite: /<namespace>/<name>/v1/completions → /v1/completions

Related Pages

Implemented By

Implementation:Kserve_Kserve_OpenAI_Completions_API

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment