Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Android InferenceEngine

From Leeroopedia
Revision as of 12:38, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Llama_cpp_Android_InferenceEngine.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Android, API
Last Updated 2026-02-15 00:00 GMT

Overview

Defines the public interface for the LLM inference engine, specifying all operations and state transitions for model loading, prompt processing, and token generation.

Description

Declares an `InferenceEngine` interface with methods: `loadModel` to load a GGUF model, `setSystemPrompt` for system instructions, `sendUserPrompt` returning a `Flow<String>` of generated tokens, `bench` for benchmarking, `cleanUp` to unload models, and `destroy` for full cleanup. Uses a sealed class `State` hierarchy with states like Uninitialized, Initializing, LoadingModel, ModelReady, Generating, and Error. Extension properties `isUninterruptible` and `isModelLoaded` provide convenient state checks.

Usage

Use this interface as the core abstraction layer for the Android AI Chat library that decouples the public API from the JNI implementation, enabling clean architecture and testability while defining the complete lifecycle contract for LLM inference operations.

Code Reference

Source Location

  • Repository: Ggml_org_Llama_cpp
  • File: examples/llama.android/lib/src/main/java/com/arm/aichat/InferenceEngine.kt
  • Lines: 1-89

Signature

interface InferenceEngine {
    val state: StateFlow<State>
    suspend fun loadModel(pathToModel: String)
    suspend fun setSystemPrompt(systemPrompt: String)
    fun sendUserPrompt(message: String, predictLength: Int = DEFAULT_PREDICT_LENGTH): Flow<String>
    suspend fun bench(pp: Int, tg: Int, pl: Int, nr: Int = 1): String
    fun cleanUp()
    fun destroy()

    sealed class State {
        object Uninitialized : State()
        object Initializing : State()
        object Initialized : State()
        object LoadingModel : State()
        object UnloadingModel : State()
        object ModelReady : State()
        object Benchmarking : State()
        object ProcessingSystemPrompt : State()
        object ProcessingUserPrompt : State()
        object Generating : State()
        data class Error(val exception: Exception) : State()
    }
}

val State.isUninterruptible: Boolean
val State.isModelLoaded: Boolean
class UnsupportedArchitectureException : Exception()

Import

import com.arm.aichat.InferenceEngine.State
import kotlinx.coroutines.flow.Flow
import kotlinx.coroutines.flow.StateFlow

I/O Contract

Inputs

Name Type Required Description
pathToModel String Yes Filesystem path to the GGUF model file
systemPrompt String Yes System prompt to configure model behavior
message String Yes User prompt message to send to the model
predictLength Int No Maximum number of tokens to generate (default 1024)
pp Int Yes Prompt processing batch size for benchmarking
tg Int Yes Token generation count for benchmarking
pl Int Yes Pipeline length for benchmarking
nr Int No Number of benchmark repetitions (default 1)

Outputs

Name Type Description
state StateFlow<State> Observable state flow representing the engine lifecycle state
sendUserPrompt return Flow<String> Stream of generated token strings
bench return String Formatted benchmark results string

Usage Examples

// Load a model and generate tokens
val engine: InferenceEngine = AiChat.getInferenceEngine(context)
engine.loadModel("/path/to/model.gguf")
engine.setSystemPrompt("You are a helpful assistant.")

engine.sendUserPrompt("Hello, world!")
    .collect { token ->
        print(token)
    }

// Check engine state
if (engine.state.value.isModelLoaded) {
    // Model is ready for inference
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment