Implementation:Mlc ai Mlc llm OpenAIProtocol Kotlin
| Knowledge Sources | |
|---|---|
| Domains | Android, API Protocol, Serialization, OpenAI Compatibility |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
OpenAIProtocol (Kotlin) defines the serializable data classes for the OpenAI-compatible chat completion API used by the MLC LLM Android client, covering requests, responses, tool calls, log probabilities, and usage statistics.
Description
The OpenAIProtocol class is a container for a set of Kotlin @Serializable data classes that mirror the OpenAI Chat Completions API. These classes are used for both serializing requests to the native engine and deserializing streamed responses. The key types defined are:
Request types:
- ChatCompletionRequest -- The main request object containing messages, model name, sampling parameters (
temperature,top_p,frequency_penalty,presence_penalty),max_tokens,stopsequences,streamflag,tools, andresponse_format. - ChatCompletionMessage -- A single message with a
role(system/user/assistant/tool), optionalcontent,name,tool_calls, andtool_call_id. - ChatCompletionMessageContent -- A polymorphic content type that can be either plain text or a list of parts (maps), with a custom serializer (
ChatCompletionMessageContentSerializer) that handles both forms. - ResponseFormat -- Specifies the response format type and an optional JSON schema.
Response types:
- ChatCompletionStreamResponse -- A streamed response chunk containing an
id, a list ofchoices, and optionalusagestatistics. - ChatCompletionStreamResponseChoice -- A single choice in a streamed response with a
deltamessage,finish_reason, and optional log probabilities.
Tool calling types:
- ChatTool -- Defines a tool with type "function" and a
ChatFunction. - ChatFunction -- Describes a function with
name, optionaldescription, andparameters. - ChatToolCall -- Represents a tool call with a UUID, type, and a
ChatFunctionCall. - ChatFunctionCall -- Contains the function
nameand optionalarguments.
Log probability types:
- TopLogProbs -- Token-level top log probabilities.
- LogProbsContent -- Log probability for a token with optional top alternatives.
- LogProbs -- Container for log probability content.
Usage types:
- CompletionUsage -- Token counts for prompt, completion, and total, plus optional
CompletionUsageExtra. - CompletionUsageExtra -- Extended performance metrics including
prefill_tokens_per_sanddecode_tokens_per_s, with anasTextLabel()helper for display.
Enum:
- ChatCompletionRole -- Enum with values:
system,user,assistant,tool.
The custom ChatCompletionMessageContentSerializer handles the polymorphic serialization: text content is serialized as a plain JSON string, while multi-part content is serialized as a JSON array of maps. Deserialization inspects the JsonElement type to determine the format.
Usage
These data classes are used throughout the Android client to construct chat completion requests (serialized to JSON for the native FFI engine) and to parse streamed responses back into typed Kotlin objects. They ensure type-safe interaction with the OpenAI-compatible API surface exposed by MLC LLM.
Code Reference
Source Location
- Repository: Mlc_ai_Mlc_llm
- File: android/mlc4j/src/main/java/ai/mlc/mlcllm/OpenAIProtocol.kt
Signature
class OpenAIProtocol {
data class TopLogProbs(val token: String, val logprob: Float, val bytes: List<Int>? = null)
data class LogProbsContent(val token: String, val logprob: Float, ...)
data class LogProbs(var content: List<LogProbsContent> = listOf())
data class ChatFunction(val name: String, var description: String? = null, val parameters: Map<String, String>)
data class ChatTool(val type: String = "function", val function: ChatFunction)
data class ChatFunctionCall(val name: String, var arguments: Map<String, String>? = null)
data class ChatToolCall(val id: String, val type: String = "function", val function: ChatFunctionCall)
enum class ChatCompletionRole { system, user, assistant, tool }
data class ChatCompletionMessageContent(val text: String? = null, val parts: List<Map<String, String>>? = null)
data class ChatCompletionMessage(val role: ChatCompletionRole, var content: ChatCompletionMessageContent?, ...)
data class CompletionUsageExtra(val prefill_tokens_per_s: Float?, val decode_tokens_per_s: Float?, ...)
data class CompletionUsage(val prompt_tokens: Int, val completion_tokens: Int, val total_tokens: Int, ...)
data class StreamOptions(val include_usage: Boolean = false)
data class ChatCompletionStreamResponseChoice(var finish_reason: String?, val index: Int, val delta: ChatCompletionMessage, ...)
data class ChatCompletionStreamResponse(val id: String, var choices: List<ChatCompletionStreamResponseChoice>, ...)
data class ChatCompletionRequest(val messages: List<ChatCompletionMessage>, val model: String?, ...)
data class ResponseFormat(val type: String, val schema: String? = null)
}
Import
import ai.mlc.mlcllm.OpenAIProtocol
I/O Contract
| Data Class | Purpose | Key Fields |
|---|---|---|
| ChatCompletionRequest | Serialized to JSON and sent to the engine | messages, model, temperature, top_p, max_tokens, stream, tools
|
| ChatCompletionStreamResponse | Deserialized from engine stream callback | id, choices, usage, model
|
| ChatCompletionMessage | Used in both requests and response deltas | role, content, tool_calls
|
| ChatCompletionMessageContent | Polymorphic content (text or parts) | text (String) or parts (List of Maps)
|
| CompletionUsage | Token usage statistics in responses | prompt_tokens, completion_tokens, total_tokens
|
| CompletionUsageExtra | Performance metrics | prefill_tokens_per_s, decode_tokens_per_s
|
Usage Examples
// Build a chat completion request
val request = OpenAIProtocol.ChatCompletionRequest(
messages = listOf(
OpenAIProtocol.ChatCompletionMessage(
role = OpenAIProtocol.ChatCompletionRole.user,
content = "Hello, how are you?"
)
),
model = "Llama-3-8B-q4f16_1",
temperature = 0.7f,
top_p = 0.95f,
stream = true
)
// Serialize to JSON for the FFI engine
val requestJson = Json.encodeToString(request)
// Parse a streamed response
val response = Json.decodeFromString<OpenAIProtocol.ChatCompletionStreamResponse>(responseJson)
val deltaText = response.choices.firstOrNull()?.delta?.content?.asText()
// Display performance metrics
response.usage?.extra?.let { extra ->
Log.d("MLCChat", extra.asTextLabel())
// e.g., "prefill: 128.5 tok/s, decode: 45.3 tok/s"
}