Implementation:Ggml org Llama cpp Android MainActivity

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Android, UI
Last Updated	2026-02-15 00:00 GMT

Overview

Main Android activity that provides a chat UI for interacting with a locally loaded GGUF language model on an Android device.

Description

Extends `AppCompatActivity` and manages a RecyclerView-based chat interface with user and assistant messages. On launch, it initializes the Arm AI Chat `InferenceEngine` via `AiChat.getInferenceEngine()`. The FAB button either prompts the user to select a GGUF model file (via `OpenDocument` contract) or sends user input to the engine. When a model is selected, it parses GGUF metadata, copies the file to internal storage, loads it into the engine, sets a system prompt, and streams generated tokens into the chat via Kotlin Flow.

Usage

Use this as the main entry point and reference implementation for the llama.cpp Android example app, demonstrating end-to-end on-device LLM inference with model loading, chat templating, and streaming token generation.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: examples/llama.android/app/src/main/java/com/example/llama/MainActivity.kt
Lines: 1-275

Signature

class MainActivity : AppCompatActivity() {
    private lateinit var engine: InferenceEngine
    private var generationJob: Job?

    override fun onCreate(savedInstanceState: Bundle?)
    private fun handleUserInput()
    private fun loadModel(uri: Uri)
}

Import

import android.net.Uri
import android.os.Bundle
import androidx.appcompat.app.AppCompatActivity
import androidx.lifecycle.lifecycleScope
import com.arm.aichat.AiChat
import com.arm.aichat.InferenceEngine
import com.arm.aichat.gguf.GgufMetadata
import com.arm.aichat.gguf.GgufMetadataReader
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.flow.onCompletion
import kotlinx.coroutines.launch

I/O Contract

Inputs

Name	Type	Required	Description
GGUF file	Uri	Yes	User-selected GGUF model file via Android document picker
user input	String	Yes	User chat message entered in the EditText field

Outputs

Name	Type	Description
chat messages	RecyclerView	Streamed assistant response tokens displayed in a chat interface
GGUF metadata	TextView	Model metadata displayed in the header

Usage Examples

// Initialize inference engine
lifecycleScope.launch(Dispatchers.Default) {
    engine = AiChat.getInferenceEngine(applicationContext)
}

// Send user prompt and collect streamed tokens
engine.sendUserPrompt(userMessage)
    .onCompletion { /* handle completion */ }
    .collect { token -> appendToChat(token) }

Related Pages

Principle:Ggml_org_Llama_cpp_Android_Integration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment