Implementation:Langchain ai Langchain OllamaLLM
| Knowledge Sources | |
|---|---|
| Domains | LLM, Ollama, Local Inference |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
OllamaLLM is a LangChain LLM integration that provides text generation using models hosted on a locally running Ollama server.
Description
The OllamaLLM class, defined in the langchain-ollama partner package, extends BaseLLM from langchain-core. It connects to a local Ollama server via the official ollama Python client to generate text completions using locally hosted models. The class uses a streaming-first approach where generation always streams internally and aggregates results. It supports reasoning/thinking mode for compatible models (capturing reasoning content separately in generation_info), configurable sampling parameters (temperature, top_k, top_p, mirostat, seed, etc.), JSON output format, custom keep-alive durations, and full synchronous/asynchronous operation. URL-based authentication is supported for proxied Ollama servers.
Usage
Import this class when you need text completion (non-chat) from models hosted locally through Ollama, such as for text generation, summarization, or code completion in local or air-gapped environments.
Code Reference
Source Location
- Repository: Langchain_ai_Langchain
- File:
libs/partners/ollama/langchain_ollama/llms.py - Lines: 1-549
Signature
class OllamaLLM(BaseLLM):
model: str
reasoning: bool | None = None
validate_model_on_init: bool = False
mirostat: int | None = None
mirostat_eta: float | None = None
mirostat_tau: float | None = None
num_ctx: int | None = None
num_gpu: int | None = None
num_thread: int | None = None
num_predict: int | None = None
repeat_last_n: int | None = None
repeat_penalty: float | None = None
temperature: float | None = None
seed: int | None = None
stop: list[str] | None = None
tfs_z: float | None = None
top_k: int | None = None
top_p: float | None = None
format: Literal["", "json"] = ""
keep_alive: int | str | None = None
base_url: str | None = None
client_kwargs: dict | None = {}
async_client_kwargs: dict | None = {}
sync_client_kwargs: dict | None = {}
Import
from langchain_ollama import OllamaLLM
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | str | Yes | Name of the Ollama model to use (e.g. "llama3.1").
|
| base_url | str or None | No | Base URL where the Ollama server is hosted. Defaults to Ollama client default. Supports userinfo auth.
|
| reasoning | bool or None | No | Controls reasoning/thinking mode for supported models. True enables, False disables, None uses model default.
|
| temperature | float or None | No | Sampling temperature. Higher values produce more creative output. Defaults to 0.8. |
| num_predict | int or None | No | Maximum tokens to generate. Defaults to 128. -1 for infinite, -2 to fill context.
|
| seed | int or None | No | Random seed for reproducible generation. |
| format | str | No | Output format. Set to "json" for JSON output. Defaults to "".
|
| keep_alive | int or str or None | No | How long the model stays loaded in memory after a request. |
| stop | list[str] or None | No | Stop tokens to halt generation. |
| top_k | int or None | No | Limits next-token selection to K most probable tokens. Defaults to 40. |
| top_p | float or None | No | Nucleus sampling parameter. Defaults to 0.9. |
| validate_model_on_init | bool | No | Whether to validate the model exists locally on init. Defaults to False. |
Outputs
| Name | Type | Description |
|---|---|---|
| LLMResult | LLMResult | Contains GenerationChunk objects with generated text and optional generation info (thinking content, finish reason).
|
| GenerationChunk | Iterator[GenerationChunk] | When streaming, yields chunks of generated text with optional reasoning content in generation_info.
|
Usage Examples
Basic Usage
from langchain_ollama import OllamaLLM
model = OllamaLLM(
model="llama3.1",
temperature=0.7,
num_predict=256,
)
response = model.invoke("The meaning of life is ")
print(response)
Streaming
from langchain_ollama import OllamaLLM
model = OllamaLLM(model="llama3.1")
for chunk in model.stream("The meaning of life is "):
print(chunk, end="")
Async Usage
from langchain_ollama import OllamaLLM
model = OllamaLLM(model="llama3.1")
response = await model.ainvoke("The meaning of life is ")
print(response)