Implementation:Neuml Txtai LiteLLM Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, NLP, LLM, API Integration |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for accessing LLM APIs (OpenAI, Anthropic, Cohere, and others) through the LiteLLM unified interface provided by txtai.
Description
LiteLLM is a generative model backend that extends the Generation base class and delegates inference to the litellm library. This enables txtai to call over 100+ LLM providers (OpenAI, Anthropic, Azure, AWS Bedrock, Google Vertex, Cohere, Hugging Face Inference API, etc.) through a single unified interface. The class includes a static ismodel method that detects whether a given model path corresponds to a LiteLLM-supported provider (while filtering out Hugging Face Hub models). It supports both streaming and non-streaming responses.
Usage
Use LiteLLM when you want to call cloud-hosted LLM APIs from within a txtai pipeline. It is automatically selected when the model path matches a known LiteLLM provider (e.g. "gpt-4", "claude-3-opus", "anthropic/claude-3-sonnet").
Code Reference
Source Location
- Repository: Neuml_Txtai
- File:
src/python/txtai/pipeline/llm/litellm.py
Signature
class LiteLLM(Generation):
@staticmethod
def ismodel(path)
@staticmethod
def ishub(path)
def __init__(self, path, template=None, **kwargs)
def stream(self, texts, maxlength, stream, stop, **kwargs)
Import
from txtai.pipeline.llm.litellm import LiteLLM
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str | Yes | Model identifier recognized by LiteLLM (e.g. "gpt-4", "anthropic/claude-3-sonnet", "cohere/command-r"). |
| template | str | No | Prompt template string applied to text inputs. |
| kwargs | dict | No | Additional keyword arguments passed to litellm.completion. Common pipeline params (quantize, gpu, model, task) are automatically filtered out.
|
| texts | list | Yes (stream) | List of prompts; each can be a string or a list of chat message dicts. |
| maxlength | int | Yes (stream) | Maximum number of tokens to generate (passed as max_tokens). |
| stream | bool | Yes (stream) | If True, streams the response. |
| stop | list | Yes (stream) | List of stop strings. |
Outputs
| Name | Type | Description |
|---|---|---|
| result | generator | Yields generated text chunks from the LLM API response. |
Usage Examples
from txtai.pipeline import LLM
# Use OpenAI GPT-4 via LiteLLM
llm = LLM("gpt-4")
result = llm("What is the capital of France?")
print(result)
# Use Anthropic Claude via LiteLLM
llm = LLM("anthropic/claude-3-sonnet-20240229")
result = llm("Explain quantum computing in simple terms")
# Chat-style input
result = llm([
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the theory of relativity."}
])
# Streaming
for token in llm("Tell me about space exploration", stream=True):
print(token, end="")