Implementation:Ggml org Llama cpp Android AI Chat JNI

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Android, JNI
Last Updated	2026-02-15 00:00 GMT

Overview

JNI native implementation that exposes llama.cpp model loading, context initialization, prompt processing, and token generation to the Android Kotlin layer.

Description

Manages global state for the llama model, context, batch, chat templates, and sampler. Provides JNI functions mapped to `InferenceEngineImpl` methods: `init` loads backends from the native library directory, `load` loads a GGUF model file, `prepare` initializes the context with configurable thread count (2-4 threads with headroom), `processSystemPrompt` and `processUserPrompt` apply chat templates and decode tokens, and `generateTokens` performs autoregressive sampling returning individual tokens. Uses a batch size of 512 and default context size of 8192.

Usage

Use this native bridge when building Android applications that require on-device LLM inference, as it translates Kotlin API calls into llama.cpp C++ operations with Android-specific optimizations like thread management and Android logging integration.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: examples/llama.android/lib/src/main/cpp/ai_chat.cpp
Lines: 1-565

Signature

// Constants
constexpr int   N_THREADS_MIN        = 2;
constexpr int   N_THREADS_MAX        = 4;
constexpr int   N_THREADS_HEADROOM   = 2;
constexpr int   DEFAULT_CONTEXT_SIZE = 8192;
constexpr int   BATCH_SIZE           = 512;
constexpr float DEFAULT_SAMPLER_TEMP = 0.3f;

// JNI Functions
JNIEXPORT void   JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_init(JNIEnv*, jobject, jstring nativeLibDir);
JNIEXPORT jint   JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_load(JNIEnv*, jobject, jstring jmodel_path);
JNIEXPORT void   JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_prepare(JNIEnv*, jobject);
JNIEXPORT jstring JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_systemInfo(JNIEnv*, jobject);
JNIEXPORT void   JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_processSystemPrompt(JNIEnv*, jobject, jstring);
JNIEXPORT void   JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_processUserPrompt(JNIEnv*, jobject, jstring);
JNIEXPORT jstring JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_generateTokens(JNIEnv*, jobject);
JNIEXPORT void   JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_cleanUp(JNIEnv*, jobject);
JNIEXPORT void   JNICALL Java_com_arm_aichat_internal_InferenceEngineImpl_destroy(JNIEnv*, jobject);

Import

#include <android/log.h>
#include <jni.h>
#include <sampling.h>
#include "logging.h"
#include "chat.h"
#include "common.h"
#include "llama.h"

I/O Contract

Inputs

Name	Type	Required	Description
nativeLibDir	jstring	Yes	Path to native library directory for backend loading
jmodel_path	jstring	Yes	Path to the GGUF model file on the device
system_prompt	jstring	No	System prompt to apply via chat template
user_prompt	jstring	Yes	User prompt to process and generate a response for

Outputs

Name	Type	Description
return_code	jint	0 on success, non-zero on failure (for load)
token	jstring	Next generated token string (for generateTokens), or empty string at end of sequence
system_info	jstring	System information string describing backend capabilities

Usage Examples

// Called from Kotlin via JNI - typical usage flow:

// 1. Initialize backends
Java_com_arm_aichat_internal_InferenceEngineImpl_init(env, obj, nativeLibDir);

// 2. Load a GGUF model
jint result = Java_com_arm_aichat_internal_InferenceEngineImpl_load(env, obj, modelPath);

// 3. Prepare context
Java_com_arm_aichat_internal_InferenceEngineImpl_prepare(env, obj);

// 4. Process prompts
Java_com_arm_aichat_internal_InferenceEngineImpl_processSystemPrompt(env, obj, systemPrompt);
Java_com_arm_aichat_internal_InferenceEngineImpl_processUserPrompt(env, obj, userPrompt);

// 5. Generate tokens in a loop
jstring token;
while ((token = Java_com_arm_aichat_internal_InferenceEngineImpl_generateTokens(env, obj)) != nullptr) {
    // process token
}

Related Pages

Principle:Ggml_org_Llama_cpp_Android_Inference

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment