Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlc ai Mlc llm OpenAIProtocol Kotlin

From Leeroopedia


Knowledge Sources
Domains Android, API Protocol, Serialization, OpenAI Compatibility
Last Updated 2026-02-09 19:00 GMT

Overview

OpenAIProtocol (Kotlin) defines the serializable data classes for the OpenAI-compatible chat completion API used by the MLC LLM Android client, covering requests, responses, tool calls, log probabilities, and usage statistics.

Description

The OpenAIProtocol class is a container for a set of Kotlin @Serializable data classes that mirror the OpenAI Chat Completions API. These classes are used for both serializing requests to the native engine and deserializing streamed responses. The key types defined are:

Request types:

  • ChatCompletionRequest -- The main request object containing messages, model name, sampling parameters (temperature, top_p, frequency_penalty, presence_penalty), max_tokens, stop sequences, stream flag, tools, and response_format.
  • ChatCompletionMessage -- A single message with a role (system/user/assistant/tool), optional content, name, tool_calls, and tool_call_id.
  • ChatCompletionMessageContent -- A polymorphic content type that can be either plain text or a list of parts (maps), with a custom serializer (ChatCompletionMessageContentSerializer) that handles both forms.
  • ResponseFormat -- Specifies the response format type and an optional JSON schema.

Response types:

  • ChatCompletionStreamResponse -- A streamed response chunk containing an id, a list of choices, and optional usage statistics.
  • ChatCompletionStreamResponseChoice -- A single choice in a streamed response with a delta message, finish_reason, and optional log probabilities.

Tool calling types:

  • ChatTool -- Defines a tool with type "function" and a ChatFunction.
  • ChatFunction -- Describes a function with name, optional description, and parameters.
  • ChatToolCall -- Represents a tool call with a UUID, type, and a ChatFunctionCall.
  • ChatFunctionCall -- Contains the function name and optional arguments.

Log probability types:

  • TopLogProbs -- Token-level top log probabilities.
  • LogProbsContent -- Log probability for a token with optional top alternatives.
  • LogProbs -- Container for log probability content.

Usage types:

  • CompletionUsage -- Token counts for prompt, completion, and total, plus optional CompletionUsageExtra.
  • CompletionUsageExtra -- Extended performance metrics including prefill_tokens_per_s and decode_tokens_per_s, with an asTextLabel() helper for display.

Enum:

  • ChatCompletionRole -- Enum with values: system, user, assistant, tool.

The custom ChatCompletionMessageContentSerializer handles the polymorphic serialization: text content is serialized as a plain JSON string, while multi-part content is serialized as a JSON array of maps. Deserialization inspects the JsonElement type to determine the format.

Usage

These data classes are used throughout the Android client to construct chat completion requests (serialized to JSON for the native FFI engine) and to parse streamed responses back into typed Kotlin objects. They ensure type-safe interaction with the OpenAI-compatible API surface exposed by MLC LLM.

Code Reference

Source Location

Signature

class OpenAIProtocol {
    data class TopLogProbs(val token: String, val logprob: Float, val bytes: List<Int>? = null)
    data class LogProbsContent(val token: String, val logprob: Float, ...)
    data class LogProbs(var content: List<LogProbsContent> = listOf())
    data class ChatFunction(val name: String, var description: String? = null, val parameters: Map<String, String>)
    data class ChatTool(val type: String = "function", val function: ChatFunction)
    data class ChatFunctionCall(val name: String, var arguments: Map<String, String>? = null)
    data class ChatToolCall(val id: String, val type: String = "function", val function: ChatFunctionCall)
    enum class ChatCompletionRole { system, user, assistant, tool }
    data class ChatCompletionMessageContent(val text: String? = null, val parts: List<Map<String, String>>? = null)
    data class ChatCompletionMessage(val role: ChatCompletionRole, var content: ChatCompletionMessageContent?, ...)
    data class CompletionUsageExtra(val prefill_tokens_per_s: Float?, val decode_tokens_per_s: Float?, ...)
    data class CompletionUsage(val prompt_tokens: Int, val completion_tokens: Int, val total_tokens: Int, ...)
    data class StreamOptions(val include_usage: Boolean = false)
    data class ChatCompletionStreamResponseChoice(var finish_reason: String?, val index: Int, val delta: ChatCompletionMessage, ...)
    data class ChatCompletionStreamResponse(val id: String, var choices: List<ChatCompletionStreamResponseChoice>, ...)
    data class ChatCompletionRequest(val messages: List<ChatCompletionMessage>, val model: String?, ...)
    data class ResponseFormat(val type: String, val schema: String? = null)
}

Import

import ai.mlc.mlcllm.OpenAIProtocol

I/O Contract

Data Class Purpose Key Fields
ChatCompletionRequest Serialized to JSON and sent to the engine messages, model, temperature, top_p, max_tokens, stream, tools
ChatCompletionStreamResponse Deserialized from engine stream callback id, choices, usage, model
ChatCompletionMessage Used in both requests and response deltas role, content, tool_calls
ChatCompletionMessageContent Polymorphic content (text or parts) text (String) or parts (List of Maps)
CompletionUsage Token usage statistics in responses prompt_tokens, completion_tokens, total_tokens
CompletionUsageExtra Performance metrics prefill_tokens_per_s, decode_tokens_per_s

Usage Examples

// Build a chat completion request
val request = OpenAIProtocol.ChatCompletionRequest(
    messages = listOf(
        OpenAIProtocol.ChatCompletionMessage(
            role = OpenAIProtocol.ChatCompletionRole.user,
            content = "Hello, how are you?"
        )
    ),
    model = "Llama-3-8B-q4f16_1",
    temperature = 0.7f,
    top_p = 0.95f,
    stream = true
)

// Serialize to JSON for the FFI engine
val requestJson = Json.encodeToString(request)

// Parse a streamed response
val response = Json.decodeFromString<OpenAIProtocol.ChatCompletionStreamResponse>(responseJson)
val deltaText = response.choices.firstOrNull()?.delta?.content?.asText()

// Display performance metrics
response.usage?.extra?.let { extra ->
    Log.d("MLCChat", extra.asTextLabel())
    // e.g., "prefill: 128.5 tok/s, decode: 45.3 tok/s"
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment