Environment:Mlflow Mlflow OpenAI LLM Integration Environment

Knowledge Sources	MLflow OpenAI Autolog
Domains	LLMs, Tracing
Last Updated	2026-02-13 20:00 GMT

Overview

Environment for MLflow LLM autologging and tracing with OpenAI, LangChain, and other LLM provider SDKs including required API credentials.

Description

This environment provides the optional LLM integration layer for MLflow tracing and autologging. When an LLM provider SDK (such as OpenAI, Anthropic, Mistral, or LiteLLM) is installed, MLflow can automatically capture API calls as traces, log token usage, and record model inputs/outputs. LangChain integration enables automatic dependency extraction and trace capture across chains, retrievers, and agents.

Usage

Use this environment when you need LLM call tracing and autologging. It is the prerequisite for `mlflow.openai.autolog()`, `mlflow.langchain.autolog()`, and other LLM-specific autolog functions. Also required for GenAI evaluation scorers that use LLM-as-judge patterns.

System Requirements

Category	Requirement	Notes
OS	Linux, macOS, or Windows	All platforms supported
Python	>= 3.10	Same as core MLflow
Network	Internet access	Required for LLM API calls

Dependencies

Python Packages (Choose by Provider)

OpenAI:

`openai` (for chat completions and embeddings autologging)

Anthropic / Claude:

`anthropic` (for Claude API autologging)

LangChain:

`langchain` >= 0.3.19, <= 1.2.9

LiteLLM:

`litellm` >= 1.0.0, < 2

Mistral:

`mistralai` (for Mistral API autologging)

GenAI Evaluation (LLM-as-Judge):

`aiohttp` < 4
`boto3` >= 1.28.56, < 2
`litellm` >= 1.0.0, < 2
`tiktoken` < 1
`gepa` >= 0.0.26, < 1

Gateway/Deployments:

`aiohttp` < 4
`boto3` >= 1.28.56, < 2
`tiktoken` < 1
`slowapi` >= 0.1.9, < 1
`watchfiles` < 2

Credentials

The following API keys are needed depending on the LLM provider used:

`OPENAI_API_KEY`: OpenAI API key (for OpenAI models)
`ANTHROPIC_API_KEY`: Anthropic API key (for Claude models)
`MISTRAL_API_KEY`: Mistral AI API key
`TOGETHERAI_API_KEY`: Together AI API key
`AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` / `AWS_SESSION_TOKEN`: AWS credentials for Bedrock
`AWS_REGION` / `AWS_ROLE_ARN`: AWS region and role for Bedrock
`DATABRICKS_HOST` / `DATABRICKS_TOKEN`: For Databricks-hosted model endpoints
`MLFLOW_OPENAI_SECRET_SCOPE`: Databricks secret scope for OpenAI keys
`MLFLOW_DEPLOYMENTS_TARGET`: URI for MLflow AI Gateway

Quick Install

# OpenAI autologging
pip install mlflow openai

# LangChain autologging
pip install mlflow langchain

# GenAI evaluation with LLM-as-judge
pip install mlflow[genai]

# AI Gateway support
pip install mlflow[gateway]

# Enable autologging in code
import mlflow
mlflow.openai.autolog()
# or
mlflow.langchain.autolog()

Code Evidence

OpenAI autolog wrapper from `mlflow/openai/autolog.py:35-101`:

def patched_call(original, self, *args, **kwargs):
    # Patches OpenAI SDK methods to capture traces
    # Captures: model, messages, temperature, max_tokens, token usage

LLM provider API key detection from `mlflow/metrics/genai/model_utils.py`:

# Checks for these API keys via os.environ.get():
# ANTHROPIC_API_KEY, AWS_ROLE_ARN, AWS_REGION,
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN,
# MISTRAL_API_KEY, TOGETHERAI_API_KEY

GenAI evaluation worker configuration from `mlflow/environment_variables.py`:

MLFLOW_GENAI_EVAL_MAX_WORKERS = _EnvironmentVariable(
    "MLFLOW_GENAI_EVAL_MAX_WORKERS", int, 10
)
MLFLOW_GENAI_EVAL_MAX_SCORER_WORKERS = _EnvironmentVariable(
    "MLFLOW_GENAI_EVAL_MAX_SCORER_WORKERS", int, 10
)

Common Errors

Error Message	Cause	Solution
`ImportError: openai`	OpenAI SDK not installed	`pip install openai`
`AuthenticationError: Incorrect API key`	Invalid or missing API key	Set `OPENAI_API_KEY` environment variable
`ImportError: langchain`	LangChain not installed	`pip install langchain>=0.3.19`
`RateLimitError`	API rate limit exceeded	Reduce `MLFLOW_GENAI_EVAL_MAX_WORKERS` to lower concurrency

Compatibility Notes

OpenAI: Supports chat completions, embeddings, and structured outputs autologging. Also supports the OpenAI Agents SDK and Responses API.
LangChain: Version range restricted to 0.3.19-1.2.9. Supports chains, retrievers, agents, and chat models.
Databricks: Can use Databricks-hosted model endpoints via `MLFLOW_DEPLOYMENTS_TARGET` and Foundation Model APIs.
LLM-as-Judge: GenAI evaluation scorers use litellm under the hood to support multiple LLM providers through a unified interface.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment