Implementation:Ggml org Llama cpp Android MainActivity
| Knowledge Sources | |
|---|---|
| Domains | Android, UI |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Main Android activity that provides a chat UI for interacting with a locally loaded GGUF language model on an Android device.
Description
Extends `AppCompatActivity` and manages a RecyclerView-based chat interface with user and assistant messages. On launch, it initializes the Arm AI Chat `InferenceEngine` via `AiChat.getInferenceEngine()`. The FAB button either prompts the user to select a GGUF model file (via `OpenDocument` contract) or sends user input to the engine. When a model is selected, it parses GGUF metadata, copies the file to internal storage, loads it into the engine, sets a system prompt, and streams generated tokens into the chat via Kotlin Flow.
Usage
Use this as the main entry point and reference implementation for the llama.cpp Android example app, demonstrating end-to-end on-device LLM inference with model loading, chat templating, and streaming token generation.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: examples/llama.android/app/src/main/java/com/example/llama/MainActivity.kt
- Lines: 1-275
Signature
class MainActivity : AppCompatActivity() {
private lateinit var engine: InferenceEngine
private var generationJob: Job?
override fun onCreate(savedInstanceState: Bundle?)
private fun handleUserInput()
private fun loadModel(uri: Uri)
}
Import
import android.net.Uri
import android.os.Bundle
import androidx.appcompat.app.AppCompatActivity
import androidx.lifecycle.lifecycleScope
import com.arm.aichat.AiChat
import com.arm.aichat.InferenceEngine
import com.arm.aichat.gguf.GgufMetadata
import com.arm.aichat.gguf.GgufMetadataReader
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.flow.onCompletion
import kotlinx.coroutines.launch
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| GGUF file | Uri | Yes | User-selected GGUF model file via Android document picker |
| user input | String | Yes | User chat message entered in the EditText field |
Outputs
| Name | Type | Description |
|---|---|---|
| chat messages | RecyclerView | Streamed assistant response tokens displayed in a chat interface |
| GGUF metadata | TextView | Model metadata displayed in the header |
Usage Examples
// Initialize inference engine
lifecycleScope.launch(Dispatchers.Default) {
engine = AiChat.getInferenceEngine(applicationContext)
}
// Send user prompt and collect streamed tokens
engine.sendUserPrompt(userMessage)
.onCompletion { /* handle completion */ }
.collect { token -> appendToChat(token) }