Implementation:Langchain ai Langchain OllamaLLM

Knowledge Sources	Langchain_ai_Langchain
Domains	LLM, Ollama, Local Inference
Last Updated	2026-02-11 00:00 GMT

Overview

OllamaLLM is a LangChain LLM integration that provides text generation using models hosted on a locally running Ollama server.

Description

The OllamaLLM class, defined in the langchain-ollama partner package, extends BaseLLM from langchain-core. It connects to a local Ollama server via the official ollama Python client to generate text completions using locally hosted models. The class uses a streaming-first approach where generation always streams internally and aggregates results. It supports reasoning/thinking mode for compatible models (capturing reasoning content separately in generation_info), configurable sampling parameters (temperature, top_k, top_p, mirostat, seed, etc.), JSON output format, custom keep-alive durations, and full synchronous/asynchronous operation. URL-based authentication is supported for proxied Ollama servers.

Usage

Import this class when you need text completion (non-chat) from models hosted locally through Ollama, such as for text generation, summarization, or code completion in local or air-gapped environments.

Code Reference

Source Location

Repository: Langchain_ai_Langchain
File: libs/partners/ollama/langchain_ollama/llms.py
Lines: 1-549

Signature

class OllamaLLM(BaseLLM):
    model: str
    reasoning: bool | None = None
    validate_model_on_init: bool = False
    mirostat: int | None = None
    mirostat_eta: float | None = None
    mirostat_tau: float | None = None
    num_ctx: int | None = None
    num_gpu: int | None = None
    num_thread: int | None = None
    num_predict: int | None = None
    repeat_last_n: int | None = None
    repeat_penalty: float | None = None
    temperature: float | None = None
    seed: int | None = None
    stop: list[str] | None = None
    tfs_z: float | None = None
    top_k: int | None = None
    top_p: float | None = None
    format: Literal["", "json"] = ""
    keep_alive: int | str | None = None
    base_url: str | None = None
    client_kwargs: dict | None = {}
    async_client_kwargs: dict | None = {}
    sync_client_kwargs: dict | None = {}

Import

from langchain_ollama import OllamaLLM

I/O Contract

Inputs

Name	Type	Required	Description
model	str	Yes	Name of the Ollama model to use (e.g. `"llama3.1"`).
base_url	str or None	No	Base URL where the Ollama server is hosted. Defaults to Ollama client default. Supports `userinfo` auth.
reasoning	bool or None	No	Controls reasoning/thinking mode for supported models. `True` enables, `False` disables, `None` uses model default.
temperature	float or None	No	Sampling temperature. Higher values produce more creative output. Defaults to 0.8.
num_predict	int or None	No	Maximum tokens to generate. Defaults to 128. `-1` for infinite, `-2` to fill context.
seed	int or None	No	Random seed for reproducible generation.
format	str	No	Output format. Set to `"json"` for JSON output. Defaults to `""`.
keep_alive	int or str or None	No	How long the model stays loaded in memory after a request.
stop	list[str] or None	No	Stop tokens to halt generation.
top_k	int or None	No	Limits next-token selection to K most probable tokens. Defaults to 40.
top_p	float or None	No	Nucleus sampling parameter. Defaults to 0.9.
validate_model_on_init	bool	No	Whether to validate the model exists locally on init. Defaults to False.

Outputs

Name	Type	Description
LLMResult	LLMResult	Contains `GenerationChunk` objects with generated text and optional generation info (thinking content, finish reason).
GenerationChunk	Iterator[GenerationChunk]	When streaming, yields chunks of generated text with optional reasoning content in `generation_info`.

Usage Examples

Basic Usage

from langchain_ollama import OllamaLLM

model = OllamaLLM(
    model="llama3.1",
    temperature=0.7,
    num_predict=256,
)

response = model.invoke("The meaning of life is ")
print(response)

Streaming

from langchain_ollama import OllamaLLM

model = OllamaLLM(model="llama3.1")

for chunk in model.stream("The meaning of life is "):
    print(chunk, end="")

Async Usage

from langchain_ollama import OllamaLLM

model = OllamaLLM(model="llama3.1")

response = await model.ainvoke("The meaning of life is ")
print(response)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment