Principle:Neuml Txtai Agent Orchestration

Overview

Agent Orchestration is the principle of assembling an autonomous agent that can select and chain tools to answer complex, multi-faceted user requests. In txtai, orchestration means bringing together a configured LLM, a set of tools, optional instructions, and an execution strategy (tool-calling or code-generation) into a single cohesive Agent object.

What Is an Agent?

An agent is a system that goes beyond simple prompt-response interaction. Rather than producing a single answer in one pass, an agent:

Reasons about the user's request and breaks it into sub-tasks.
Selects the most appropriate tool for each sub-task.
Executes the tool and observes the result.
Iterates -- potentially calling additional tools -- until it has enough information to produce a final answer.

This iterative loop is what distinguishes agents from simpler patterns like retrieval-augmented generation (RAG), where the retrieval step is fixed and predetermined.

Agent Loops

The core of any agent is its execution loop (also called the agent loop or reasoning loop). In txtai, the loop follows the ReAct (Reason + Act) pattern:

Thought -- The LLM reasons about what to do next given the current context.
Action -- The LLM emits a tool call (name + arguments).
Observation -- The orchestrator executes the tool and feeds the result back.
Repeat -- Steps 1-3 repeat until the model decides it has a final answer, or a maximum step count is reached.

The loop is managed by the underlying smolagents framework, which txtai wraps with its own Agent and ProcessFactory classes.

Tool-Calling Agents vs. Code Agents

txtai supports two agent execution strategies through the method configuration parameter:

Tool-Calling Agents (Default)

Tool-calling agents use structured action blocks to invoke tools. The LLM outputs a JSON-like structure specifying the tool name and arguments. The orchestrator parses this structure, calls the tool, and injects the result.

Advantages:

Predictable, parseable output format.
Easy to audit and log tool usage.
Works well with models fine-tuned for function calling.

Disadvantages:

Limited to one tool call per reasoning step (in most implementations).
Cannot perform arbitrary computation between tool calls.

Code Agents

Code agents generate Python code that can call tools as functions. The orchestrator executes the code in a sandboxed environment.

Advantages:

Can chain multiple tool calls in a single step.
Can perform intermediate computation (math, string manipulation, conditionals).
More natural for complex data transformations.

Disadvantages:

More difficult to sandbox safely.
Output parsing is more complex.
Requires models with strong code generation capabilities.

The choice between these strategies is made at agent construction time via method="code" for code agents; the default creates a tool-calling agent.

Multi-Step Reasoning

Complex requests often cannot be answered by a single tool call. Consider the question: "Find recent papers about climate change and summarise the key findings." An agent might:

Call a search tool to find relevant papers.
Call a web viewer tool to read the full text of the top results.
Use its own reasoning ability to synthesise the findings into a summary.

Each step builds on the previous one. The agent must maintain context across steps, interpret intermediate results, and decide when it has enough information to stop. This multi-step reasoning capability is what makes agents suitable for complex tasks that would be difficult to express as a single pipeline.

The ProcessFactory

The ProcessFactory is the internal component that assembles the agent process runner. It:

Selects the agent class: ToolCallingAgent (default) or CodeAgent (when method="code").
Constructs the PipelineModel from the model configuration.
Calls ToolFactory.create(config) to build the tool list.
Instantiates the chosen agent class with the tools, model, and remaining configuration.

This factory pattern keeps the Agent class itself clean and focused on prompt generation and memory management.

Agent Configuration

Key configuration parameters that affect orchestration:

Parameter	Description	Default
`tools`	List of tool specifications (callables, dicts, strings, Tool instances).	`[]`
`model` / `llm`	LLM configuration (dict or path string or LLM instance).	Required
`method`	Agent type: `None` for tool-calling, `"code"` for code agent.	`None`
`max_steps`	Maximum number of reasoning steps before the agent stops.	Framework default
`instructions`	Custom system instructions (string or path to a `.md` file).	`None`
`template`	Jinja template for prompt formatting. Must include `Template:Text` and `Template:Memory`.	Built-in default
`memory`	Number of prior interactions to keep as conversational memory.	`None`

When to Use Agents vs. Other Patterns

Agents excel when:

The task requires multiple tools or multiple steps.
The optimal sequence of operations is not known in advance.
The task benefits from adaptive reasoning (the next step depends on the results of previous steps).

Agents are not the best choice when:

The task is simple and a single pipeline or RAG call suffices.
The process is rule-based and deterministic (use Workflows instead).
Latency is critical and the iterative loop introduces unacceptable delay.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment