Implementation:Ggml org Llama cpp Android AI Chat JNI
| Knowledge Sources | |
|---|---|
| Domains | Android, JNI |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
JNI native implementation that exposes llama.cpp model loading, context initialization, prompt processing, and token generation to the Android Kotlin layer.
Description
Manages global state for the llama model, context, batch, chat templates, and sampler. Provides JNI functions mapped to `InferenceEngineImpl` methods: `init` loads backends from the native library directory, `load` loads a GGUF model file, `prepare` initializes the context with configurable thread count (2-4 threads with headroom), `processSystemPrompt` and `processUserPrompt` apply chat templates and decode tokens, and `generateTokens` performs autoregressive sampling returning individual tokens. Uses a batch size of 512 and default context size of 8192.
Usage
Use this native bridge when building Android applications that require on-device LLM inference, as it translates Kotlin API calls into llama.cpp C++ operations with Android-specific optimizations like thread management and Android logging integration.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: examples/llama.android/lib/src/main/cpp/ai_chat.cpp
- Lines: 1-565
Signature
// Constants
constexpr int N_THREADS_MIN = 2;
constexpr int N_THREADS_MAX = 4;
constexpr int N_THREADS_HEADROOM = 2;
constexpr int DEFAULT_CONTEXT_SIZE = 8192;
constexpr int BATCH_SIZE = 512;
constexpr float DEFAULT_SAMPLER_TEMP = 0.3f;
// JNI Functions
JNIEXPORT void JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_init(JNIEnv*, jobject, jstring nativeLibDir);
JNIEXPORT jint JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_load(JNIEnv*, jobject, jstring jmodel_path);
JNIEXPORT void JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_prepare(JNIEnv*, jobject);
JNIEXPORT jstring JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_systemInfo(JNIEnv*, jobject);
JNIEXPORT void JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_processSystemPrompt(JNIEnv*, jobject, jstring);
JNIEXPORT void JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_processUserPrompt(JNIEnv*, jobject, jstring);
JNIEXPORT jstring JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_generateTokens(JNIEnv*, jobject);
JNIEXPORT void JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_cleanUp(JNIEnv*, jobject);
JNIEXPORT void JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_destroy(JNIEnv*, jobject);
Import
#include <android/log.h>
#include <jni.h>
#include <sampling.h>
#include "logging.h"
#include "chat.h"
#include "common.h"
#include "llama.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| nativeLibDir | jstring | Yes | Path to native library directory for backend loading |
| jmodel_path | jstring | Yes | Path to the GGUF model file on the device |
| system_prompt | jstring | No | System prompt to apply via chat template |
| user_prompt | jstring | Yes | User prompt to process and generate a response for |
Outputs
| Name | Type | Description |
|---|---|---|
| return_code | jint | 0 on success, non-zero on failure (for load) |
| token | jstring | Next generated token string (for generateTokens), or empty string at end of sequence |
| system_info | jstring | System information string describing backend capabilities |
Usage Examples
// Called from Kotlin via JNI - typical usage flow:
// 1. Initialize backends
Java_com_arm_aichat_internal_InferenceEngineImpl_init(env, obj, nativeLibDir);
// 2. Load a GGUF model
jint result = Java_com_arm_aichat_internal_InferenceEngineImpl_load(env, obj, modelPath);
// 3. Prepare context
Java_com_arm_aichat_internal_InferenceEngineImpl_prepare(env, obj);
// 4. Process prompts
Java_com_arm_aichat_internal_InferenceEngineImpl_processSystemPrompt(env, obj, systemPrompt);
Java_com_arm_aichat_internal_InferenceEngineImpl_processUserPrompt(env, obj, userPrompt);
// 5. Generate tokens in a loop
jstring token;
while ((token = Java_com_arm_aichat_internal_InferenceEngineImpl_generateTokens(env, obj)) != nullptr) {
// process token
}