Implementation:Mlc ai Mlc llm JSONFFIEngine Java

Knowledge Sources	Mlc_ai_Mlc_llm
Domains	Android, FFI, LLM Inference, TVM Runtime
Last Updated	2026-02-09 19:00 GMT

Overview

JSONFFIEngine (Java) is a Java wrapper around the MLC LLM JSON FFI engine, providing a Java-accessible interface to the native TVM-based LLM inference engine via TVM's Java runtime bindings.

Description

The JSONFFIEngine class uses TVM's Java runtime (org.apache.tvm) to create and manage a native JSON FFI engine instance. On construction, it calls the TVM function mlc.json_ffi.CreateJSONFFIEngine to instantiate the underlying engine module, then resolves references to the following TVM-registered functions:

initBackgroundEngineFunc -- Initializes the background engine with a device and a stream callback
reloadFunc -- Reloads the engine with a new configuration JSON string
unloadFunc -- Unloads the current model
resetFunc -- Resets the engine state
chatCompletionFunc -- Submits a chat completion request with a JSON string and request ID
abortFunc -- Aborts a running request
getLastErrorFunc -- Retrieves the last error message
runBackgroundLoopFunc -- Runs the main background processing loop
runBackgroundStreamBackLoopFunc -- Runs the background stream-back loop for streaming responses
exitBackgroundLoopFunc -- Exits the background loop

The initBackgroundEngine method sets up an OpenCL device and wraps a KotlinFunction callback (a functional interface accepting a String) into a TVM Function.Callback that receives streamed chat completion response JSON strings.

Usage

This class is used by the Android application layer (typically from Kotlin code via the KotlinFunction interface) to interact with the native MLC LLM inference engine. It is the primary bridge between the Android UI layer and the compiled model backend.

Code Reference

Source Location

Repository: Mlc_ai_Mlc_llm
File: android/mlc4j/src/main/java/ai/mlc/mlcllm/JSONFFIEngine.java

Signature

public class JSONFFIEngine {
    public JSONFFIEngine()
    public void initBackgroundEngine(KotlinFunction callback)
    public void reload(String engineConfigJSONStr)
    public void chatCompletion(String requestJSONStr, String requestId)
    public void runBackgroundLoop()
    public void runBackgroundStreamBackLoop()
    public void exitBackgroundLoop()
    public void unload()
    public void reset()

    public interface KotlinFunction {
        void invoke(String arg);
    }
}

Import

import ai.mlc.mlcllm.JSONFFIEngine;

I/O Contract

Method	Input	Type	Description
initBackgroundEngine	callback	`KotlinFunction`	Callback invoked with JSON string for each streamed chat completion response
reload	engineConfigJSONStr	`String`	JSON configuration string for the engine (model path, parameters, etc.)
chatCompletion	requestJSONStr	`String`	JSON string of the chat completion request following OpenAI protocol
chatCompletion	requestId	`String`	Unique identifier for the request

Method	Output / Side Effect	Description
initBackgroundEngine	Engine initialized	Creates an OpenCL device and registers the stream callback with the native engine
reload	Model loaded	Loads a new model configuration into the engine
chatCompletion	Inference started	Begins processing a chat completion, responses streamed via the callback
runBackgroundLoop	Blocking loop	Runs the engine's main processing loop (blocks the calling thread)
runBackgroundStreamBackLoop	Blocking loop	Runs the stream-back loop that delivers responses (blocks the calling thread)
exitBackgroundLoop	Loop terminated	Signals the background loop to exit
unload	Model unloaded	Releases the currently loaded model from memory
reset	State cleared	Resets the engine to its initial state

Usage Examples

// Create and initialize the engine
JSONFFIEngine engine = new JSONFFIEngine();

// Initialize with a callback for streaming responses
engine.initBackgroundEngine(responseJson -> {
    // Handle streamed chat completion response JSON
    Log.d("MLCChat", "Response: " + responseJson);
});

// Load a model
engine.reload(engineConfigJson);

// Submit a chat completion request
engine.chatCompletion(requestJson, "request-001");

// Run background loops on separate threads
new Thread(() -> engine.runBackgroundLoop()).start();
new Thread(() -> engine.runBackgroundStreamBackLoop()).start();

// When done, clean up
engine.exitBackgroundLoop();
engine.unload();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment