Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mlc ai Mlc llm JSONFFIEngine Java

From Leeroopedia


Knowledge Sources
Domains Android, FFI, LLM Inference, TVM Runtime
Last Updated 2026-02-09 19:00 GMT

Overview

JSONFFIEngine (Java) is a Java wrapper around the MLC LLM JSON FFI engine, providing a Java-accessible interface to the native TVM-based LLM inference engine via TVM's Java runtime bindings.

Description

The JSONFFIEngine class uses TVM's Java runtime (org.apache.tvm) to create and manage a native JSON FFI engine instance. On construction, it calls the TVM function mlc.json_ffi.CreateJSONFFIEngine to instantiate the underlying engine module, then resolves references to the following TVM-registered functions:

  • initBackgroundEngineFunc -- Initializes the background engine with a device and a stream callback
  • reloadFunc -- Reloads the engine with a new configuration JSON string
  • unloadFunc -- Unloads the current model
  • resetFunc -- Resets the engine state
  • chatCompletionFunc -- Submits a chat completion request with a JSON string and request ID
  • abortFunc -- Aborts a running request
  • getLastErrorFunc -- Retrieves the last error message
  • runBackgroundLoopFunc -- Runs the main background processing loop
  • runBackgroundStreamBackLoopFunc -- Runs the background stream-back loop for streaming responses
  • exitBackgroundLoopFunc -- Exits the background loop

The initBackgroundEngine method sets up an OpenCL device and wraps a KotlinFunction callback (a functional interface accepting a String) into a TVM Function.Callback that receives streamed chat completion response JSON strings.

Usage

This class is used by the Android application layer (typically from Kotlin code via the KotlinFunction interface) to interact with the native MLC LLM inference engine. It is the primary bridge between the Android UI layer and the compiled model backend.

Code Reference

Source Location

Signature

public class JSONFFIEngine {
    public JSONFFIEngine()
    public void initBackgroundEngine(KotlinFunction callback)
    public void reload(String engineConfigJSONStr)
    public void chatCompletion(String requestJSONStr, String requestId)
    public void runBackgroundLoop()
    public void runBackgroundStreamBackLoop()
    public void exitBackgroundLoop()
    public void unload()
    public void reset()

    public interface KotlinFunction {
        void invoke(String arg);
    }
}

Import

import ai.mlc.mlcllm.JSONFFIEngine;

I/O Contract

Method Input Type Description
initBackgroundEngine callback KotlinFunction Callback invoked with JSON string for each streamed chat completion response
reload engineConfigJSONStr String JSON configuration string for the engine (model path, parameters, etc.)
chatCompletion requestJSONStr String JSON string of the chat completion request following OpenAI protocol
chatCompletion requestId String Unique identifier for the request
Method Output / Side Effect Description
initBackgroundEngine Engine initialized Creates an OpenCL device and registers the stream callback with the native engine
reload Model loaded Loads a new model configuration into the engine
chatCompletion Inference started Begins processing a chat completion, responses streamed via the callback
runBackgroundLoop Blocking loop Runs the engine's main processing loop (blocks the calling thread)
runBackgroundStreamBackLoop Blocking loop Runs the stream-back loop that delivers responses (blocks the calling thread)
exitBackgroundLoop Loop terminated Signals the background loop to exit
unload Model unloaded Releases the currently loaded model from memory
reset State cleared Resets the engine to its initial state

Usage Examples

// Create and initialize the engine
JSONFFIEngine engine = new JSONFFIEngine();

// Initialize with a callback for streaming responses
engine.initBackgroundEngine(responseJson -> {
    // Handle streamed chat completion response JSON
    Log.d("MLCChat", "Response: " + responseJson);
});

// Load a model
engine.reload(engineConfigJson);

// Submit a chat completion request
engine.chatCompletion(requestJson, "request-001");

// Run background loops on separate threads
new Thread(() -> engine.runBackgroundLoop()).start();
new Thread(() -> engine.runBackgroundStreamBackLoop()).start();

// When done, clean up
engine.exitBackgroundLoop();
engine.unload();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment