Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Langchain ai Langchain OllamaLLM

From Leeroopedia
Revision as of 11:24, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Langchain_ai_Langchain_OllamaLLM.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains LLM, Ollama, Local Inference
Last Updated 2026-02-11 00:00 GMT

Overview

OllamaLLM is a LangChain LLM integration that provides text generation using models hosted on a locally running Ollama server.

Description

The OllamaLLM class, defined in the langchain-ollama partner package, extends BaseLLM from langchain-core. It connects to a local Ollama server via the official ollama Python client to generate text completions using locally hosted models. The class uses a streaming-first approach where generation always streams internally and aggregates results. It supports reasoning/thinking mode for compatible models (capturing reasoning content separately in generation_info), configurable sampling parameters (temperature, top_k, top_p, mirostat, seed, etc.), JSON output format, custom keep-alive durations, and full synchronous/asynchronous operation. URL-based authentication is supported for proxied Ollama servers.

Usage

Import this class when you need text completion (non-chat) from models hosted locally through Ollama, such as for text generation, summarization, or code completion in local or air-gapped environments.

Code Reference

Source Location

Signature

class OllamaLLM(BaseLLM):
    model: str
    reasoning: bool | None = None
    validate_model_on_init: bool = False
    mirostat: int | None = None
    mirostat_eta: float | None = None
    mirostat_tau: float | None = None
    num_ctx: int | None = None
    num_gpu: int | None = None
    num_thread: int | None = None
    num_predict: int | None = None
    repeat_last_n: int | None = None
    repeat_penalty: float | None = None
    temperature: float | None = None
    seed: int | None = None
    stop: list[str] | None = None
    tfs_z: float | None = None
    top_k: int | None = None
    top_p: float | None = None
    format: Literal["", "json"] = ""
    keep_alive: int | str | None = None
    base_url: str | None = None
    client_kwargs: dict | None = {}
    async_client_kwargs: dict | None = {}
    sync_client_kwargs: dict | None = {}

Import

from langchain_ollama import OllamaLLM

I/O Contract

Inputs

Name Type Required Description
model str Yes Name of the Ollama model to use (e.g. "llama3.1").
base_url str or None No Base URL where the Ollama server is hosted. Defaults to Ollama client default. Supports userinfo auth.
reasoning bool or None No Controls reasoning/thinking mode for supported models. True enables, False disables, None uses model default.
temperature float or None No Sampling temperature. Higher values produce more creative output. Defaults to 0.8.
num_predict int or None No Maximum tokens to generate. Defaults to 128. -1 for infinite, -2 to fill context.
seed int or None No Random seed for reproducible generation.
format str No Output format. Set to "json" for JSON output. Defaults to "".
keep_alive int or str or None No How long the model stays loaded in memory after a request.
stop list[str] or None No Stop tokens to halt generation.
top_k int or None No Limits next-token selection to K most probable tokens. Defaults to 40.
top_p float or None No Nucleus sampling parameter. Defaults to 0.9.
validate_model_on_init bool No Whether to validate the model exists locally on init. Defaults to False.

Outputs

Name Type Description
LLMResult LLMResult Contains GenerationChunk objects with generated text and optional generation info (thinking content, finish reason).
GenerationChunk Iterator[GenerationChunk] When streaming, yields chunks of generated text with optional reasoning content in generation_info.

Usage Examples

Basic Usage

from langchain_ollama import OllamaLLM

model = OllamaLLM(
    model="llama3.1",
    temperature=0.7,
    num_predict=256,
)

response = model.invoke("The meaning of life is ")
print(response)

Streaming

from langchain_ollama import OllamaLLM

model = OllamaLLM(model="llama3.1")

for chunk in model.stream("The meaning of life is "):
    print(chunk, end="")

Async Usage

from langchain_ollama import OllamaLLM

model = OllamaLLM(model="llama3.1")

response = await model.ainvoke("The meaning of life is ")
print(response)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment