Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Android InferenceEngineImpl

From Leeroopedia
Knowledge Sources
Domains Android, Inference
Last Updated 2026-02-15 00:00 GMT

Overview

Singleton JNI wrapper implementation of the `InferenceEngine` interface that manages the full lifecycle of a llama.cpp model instance on Android.

Description

Uses a private constructor with thread-safe double-checked locking via `getInstance(Context)` to create a singleton. Declares `@FastNative` external JNI methods (`init`, `load`, `prepare`, `systemInfo`, `benchModel`, `processSystemPrompt`, `processUserPrompt`, `generateTokens`, `cleanUp`, `destroy`) that map to native C++ functions in `ai_chat.cpp`. Manages state transitions via a `MutableStateFlow<State>` and executes native operations on a dedicated single-threaded coroutine dispatcher for thread safety, exposing token generation as a Kotlin `Flow<String>`.

Usage

Use this class as the primary entry point for LLM inference on Android. Obtain the singleton via `getInstance(context)`, then load models, send prompts, and collect generated tokens as Kotlin Flow streams within coroutine scopes.

Code Reference

Source Location

  • Repository: Ggml_org_Llama_cpp
  • File: examples/llama.android/lib/src/main/java/com/arm/aichat/internal/InferenceEngineImpl.kt
  • Lines: 1-324

Signature

internal class InferenceEngineImpl private constructor(
    private val nativeLibDir: String
) : InferenceEngine {

    companion object {
        internal fun getInstance(context: Context): InferenceEngine
    }

    // JNI native methods
    @FastNative external fun init(nativeLibDir: String)
    @FastNative external fun load(modelPath: String): Int
    @FastNative external fun prepare()
    @FastNative external fun systemInfo(): String
    @FastNative external fun benchModel(pp: Int, tg: Int, pl: Int, nr: Int): String
    @FastNative external fun processSystemPrompt(prompt: String)
    @FastNative external fun processUserPrompt(prompt: String)
    @FastNative external fun generateTokens(): String
    @FastNative external fun cleanUp()
    @FastNative external fun destroy()

    // Kotlin API
    suspend fun loadModel(modelPath: String)
    fun sendUserPrompt(prompt: String): Flow<String>
    val state: StateFlow<State>
}

Import

import android.content.Context
import com.arm.aichat.InferenceEngine
import dalvik.annotation.optimization.FastNative
import kotlinx.coroutines.flow.Flow
import kotlinx.coroutines.flow.MutableStateFlow
import kotlinx.coroutines.flow.StateFlow

I/O Contract

Inputs

Name Type Required Description
context Context Yes Android Context for obtaining the native library directory path
modelPath String Yes Absolute path to the GGUF model file on the device
prompt String Yes User prompt text to send for inference

Outputs

Name Type Description
state StateFlow<State> Observable state flow tracking engine lifecycle (Idle, Loading, Ready, Generating, etc.)
tokenFlow Flow<String> Kotlin Flow emitting generated tokens one at a time as strings
systemInfo String Backend and system capability information string

Usage Examples

// Obtain singleton instance
val engine = InferenceEngineImpl.getInstance(applicationContext)

// Load a model
engine.loadModel("/data/local/tmp/model.gguf")

// Send a user prompt and collect generated tokens
engine.sendUserPrompt("What is the capital of France?")
    .collect { token ->
        print(token)
    }

// Observe state transitions
engine.state.collect { state ->
    when (state) {
        is State.Idle -> { /* ready for loading */ }
        is State.Ready -> { /* model loaded, ready for prompts */ }
        is State.Generating -> { /* currently generating */ }
    }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment