Implementation:Mlc ai Mlc llm JSONFFIEngine Java
| Knowledge Sources | |
|---|---|
| Domains | Android, FFI, LLM Inference, TVM Runtime |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
JSONFFIEngine (Java) is a Java wrapper around the MLC LLM JSON FFI engine, providing a Java-accessible interface to the native TVM-based LLM inference engine via TVM's Java runtime bindings.
Description
The JSONFFIEngine class uses TVM's Java runtime (org.apache.tvm) to create and manage a native JSON FFI engine instance. On construction, it calls the TVM function mlc.json_ffi.CreateJSONFFIEngine to instantiate the underlying engine module, then resolves references to the following TVM-registered functions:
- initBackgroundEngineFunc -- Initializes the background engine with a device and a stream callback
- reloadFunc -- Reloads the engine with a new configuration JSON string
- unloadFunc -- Unloads the current model
- resetFunc -- Resets the engine state
- chatCompletionFunc -- Submits a chat completion request with a JSON string and request ID
- abortFunc -- Aborts a running request
- getLastErrorFunc -- Retrieves the last error message
- runBackgroundLoopFunc -- Runs the main background processing loop
- runBackgroundStreamBackLoopFunc -- Runs the background stream-back loop for streaming responses
- exitBackgroundLoopFunc -- Exits the background loop
The initBackgroundEngine method sets up an OpenCL device and wraps a KotlinFunction callback (a functional interface accepting a String) into a TVM Function.Callback that receives streamed chat completion response JSON strings.
Usage
This class is used by the Android application layer (typically from Kotlin code via the KotlinFunction interface) to interact with the native MLC LLM inference engine. It is the primary bridge between the Android UI layer and the compiled model backend.
Code Reference
Source Location
- Repository: Mlc_ai_Mlc_llm
- File: android/mlc4j/src/main/java/ai/mlc/mlcllm/JSONFFIEngine.java
Signature
public class JSONFFIEngine {
public JSONFFIEngine()
public void initBackgroundEngine(KotlinFunction callback)
public void reload(String engineConfigJSONStr)
public void chatCompletion(String requestJSONStr, String requestId)
public void runBackgroundLoop()
public void runBackgroundStreamBackLoop()
public void exitBackgroundLoop()
public void unload()
public void reset()
public interface KotlinFunction {
void invoke(String arg);
}
}
Import
import ai.mlc.mlcllm.JSONFFIEngine;
I/O Contract
| Method | Input | Type | Description |
|---|---|---|---|
| initBackgroundEngine | callback | KotlinFunction |
Callback invoked with JSON string for each streamed chat completion response |
| reload | engineConfigJSONStr | String |
JSON configuration string for the engine (model path, parameters, etc.) |
| chatCompletion | requestJSONStr | String |
JSON string of the chat completion request following OpenAI protocol |
| chatCompletion | requestId | String |
Unique identifier for the request |
| Method | Output / Side Effect | Description |
|---|---|---|
| initBackgroundEngine | Engine initialized | Creates an OpenCL device and registers the stream callback with the native engine |
| reload | Model loaded | Loads a new model configuration into the engine |
| chatCompletion | Inference started | Begins processing a chat completion, responses streamed via the callback |
| runBackgroundLoop | Blocking loop | Runs the engine's main processing loop (blocks the calling thread) |
| runBackgroundStreamBackLoop | Blocking loop | Runs the stream-back loop that delivers responses (blocks the calling thread) |
| exitBackgroundLoop | Loop terminated | Signals the background loop to exit |
| unload | Model unloaded | Releases the currently loaded model from memory |
| reset | State cleared | Resets the engine to its initial state |
Usage Examples
// Create and initialize the engine
JSONFFIEngine engine = new JSONFFIEngine();
// Initialize with a callback for streaming responses
engine.initBackgroundEngine(responseJson -> {
// Handle streamed chat completion response JSON
Log.d("MLCChat", "Response: " + responseJson);
});
// Load a model
engine.reload(engineConfigJson);
// Submit a chat completion request
engine.chatCompletion(requestJson, "request-001");
// Run background loops on separate threads
new Thread(() -> engine.runBackgroundLoop()).start();
new Thread(() -> engine.runBackgroundStreamBackLoop()).start();
// When done, clean up
engine.exitBackgroundLoop();
engine.unload();