Principle:Neuml Txtai Agent LLM Configuration

Overview

The language model is the reasoning backbone of every agent. It decides which tools to call, how to interpret results, and when to produce a final answer. The Agent LLM Configuration principle covers how the LLM backend is selected, configured, and integrated into the agent execution pipeline in txtai.

LLM as Agent Backbone

In an agent architecture, the LLM serves multiple roles simultaneously:

Planner -- It analyses the user request and determines what steps are needed.
Tool selector -- It chooses which tool(s) to invoke based on tool descriptions in the system prompt.
Argument generator -- It produces structured arguments (typically JSON) for the selected tool.
Synthesiser -- It combines tool outputs with prior reasoning to produce the final answer.

Because the LLM drives every phase of the agent loop, its choice has outsized impact on agent quality. A model that struggles with structured output or tool-calling conventions will produce unreliable agents regardless of how well the tools are defined.

Model Selection for Tool-Calling Capability

Not all language models are equally suited for agent workflows. Key factors include:

Instruction following -- The model must reliably follow system prompts that describe tools and output formats.
Structured output -- The model should be able to emit well-formed JSON or action blocks that the orchestrator can parse.
Reasoning depth -- Multi-step problems require the model to plan ahead and reason about intermediate results.
Context length -- Agent prompts are large (system instructions + tool descriptions + conversation history + tool outputs). The model must handle this context without degradation.

txtai supports both local models (via its own LLM pipeline) and API-hosted models. The PipelineModel class abstracts over both, presenting a uniform interface to the agent orchestrator.

The PipelineModel Abstraction

The PipelineModel class bridges txtai's LLM pipeline with the smolagents.Model interface. This abstraction provides several benefits:

Framework agnostic -- The same agent code works whether the underlying model is a local Hugging Face model, an OpenAI API endpoint, or any other backend supported by txtai's LLM pipeline.
Automatic detection -- When a model path is provided as a string, the LLM pipeline infers the appropriate framework (e.g., Hugging Face Transformers, llama.cpp, API provider).
Vision support -- The model automatically detects whether the underlying LLM supports vision inputs and adjusts message flattening behaviour accordingly.
Parameter control -- Runtime parameters like maxlength (maximum generation length) can be adjusted per call.

Configuration Approaches

Dictionary Configuration

The most common approach is to pass a dictionary with the model path and any additional keyword arguments:

model_config = {
    "path": "hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
    "quantize": True
}

The dictionary is unpacked into PipelineModel(path=..., **kwargs).

Pre-built LLM Instance

An existing LLM pipeline instance can be passed directly. This is useful when the same model is shared across agents or other pipeline components:

from txtai.pipeline import LLM

llm = LLM("meta-llama/Meta-Llama-3.1-8B-Instruct")
# llm is passed directly to Agent via the "model" key

String Path

A simple string model path can also be passed. The PipelineModel will construct the LLM pipeline internally.

Message Handling

The PipelineModel handles the conversion between the agent framework's message format and the underlying LLM's expected input:

Messages are cleaned and normalised using smolagents.get_clean_message_list.
Role enums are converted to plain strings for compatibility across LLM frameworks.
Stop sequences are applied post-generation to trim the output.
Tool call actions are extracted from the response text using regex parsing when tools_to_call_from is provided.

Design Considerations

Model-Agent Coupling

The model configuration is tightly coupled to agent performance. Changing the model may require adjusting:

max_steps -- More capable models need fewer steps; weaker models may need more iterations.
Tool descriptions -- Some models respond better to terser descriptions; others benefit from verbose explanations.
Temperature -- Lower temperatures produce more deterministic tool calls; higher temperatures can help with creative problem-solving.

Local vs. API Models

Consideration	Local Models	API Models
Latency	Higher per-token latency	Lower per-token latency
Cost	One-time hardware cost	Pay-per-token
Privacy	Data stays local	Data sent to provider
Model size	Limited by hardware	Access to largest models
Reliability	No external dependencies	Subject to rate limits and outages

txtai's PipelineModel treats both uniformly, so switching between local and API models requires only a configuration change.

Relationship to the Agent Execution Workflow

Within the Agent_Execution workflow, LLM configuration is the second step:

Define tools (embeddings, functions, skills).
Configure the LLM -- Select and initialise the model that will drive agent reasoning.
Create the agent with tools and model.
Run agent tasks.

The model must be configured before the agent is created because the ProcessFactory constructs the PipelineModel during agent initialisation.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment