Environment:Mlflow Mlflow OpenAI LLM Integration Environment
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Tracing |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Environment for MLflow LLM autologging and tracing with OpenAI, LangChain, and other LLM provider SDKs including required API credentials.
Description
This environment provides the optional LLM integration layer for MLflow tracing and autologging. When an LLM provider SDK (such as OpenAI, Anthropic, Mistral, or LiteLLM) is installed, MLflow can automatically capture API calls as traces, log token usage, and record model inputs/outputs. LangChain integration enables automatic dependency extraction and trace capture across chains, retrievers, and agents.
Usage
Use this environment when you need LLM call tracing and autologging. It is the prerequisite for `mlflow.openai.autolog()`, `mlflow.langchain.autolog()`, and other LLM-specific autolog functions. Also required for GenAI evaluation scorers that use LLM-as-judge patterns.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, or Windows | All platforms supported |
| Python | >= 3.10 | Same as core MLflow |
| Network | Internet access | Required for LLM API calls |
Dependencies
Python Packages (Choose by Provider)
OpenAI:
- `openai` (for chat completions and embeddings autologging)
Anthropic / Claude:
- `anthropic` (for Claude API autologging)
LangChain:
- `langchain` >= 0.3.19, <= 1.2.9
LiteLLM:
- `litellm` >= 1.0.0, < 2
Mistral:
- `mistralai` (for Mistral API autologging)
GenAI Evaluation (LLM-as-Judge):
- `aiohttp` < 4
- `boto3` >= 1.28.56, < 2
- `litellm` >= 1.0.0, < 2
- `tiktoken` < 1
- `gepa` >= 0.0.26, < 1
Gateway/Deployments:
- `aiohttp` < 4
- `boto3` >= 1.28.56, < 2
- `tiktoken` < 1
- `slowapi` >= 0.1.9, < 1
- `watchfiles` < 2
Credentials
The following API keys are needed depending on the LLM provider used:
- `OPENAI_API_KEY`: OpenAI API key (for OpenAI models)
- `ANTHROPIC_API_KEY`: Anthropic API key (for Claude models)
- `MISTRAL_API_KEY`: Mistral AI API key
- `TOGETHERAI_API_KEY`: Together AI API key
- `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` / `AWS_SESSION_TOKEN`: AWS credentials for Bedrock
- `AWS_REGION` / `AWS_ROLE_ARN`: AWS region and role for Bedrock
- `DATABRICKS_HOST` / `DATABRICKS_TOKEN`: For Databricks-hosted model endpoints
- `MLFLOW_OPENAI_SECRET_SCOPE`: Databricks secret scope for OpenAI keys
- `MLFLOW_DEPLOYMENTS_TARGET`: URI for MLflow AI Gateway
Quick Install
# OpenAI autologging
pip install mlflow openai
# LangChain autologging
pip install mlflow langchain
# GenAI evaluation with LLM-as-judge
pip install mlflow[genai]
# AI Gateway support
pip install mlflow[gateway]
# Enable autologging in code
import mlflow
mlflow.openai.autolog()
# or
mlflow.langchain.autolog()
Code Evidence
OpenAI autolog wrapper from `mlflow/openai/autolog.py:35-101`:
def patched_call(original, self, *args, **kwargs):
# Patches OpenAI SDK methods to capture traces
# Captures: model, messages, temperature, max_tokens, token usage
LLM provider API key detection from `mlflow/metrics/genai/model_utils.py`:
# Checks for these API keys via os.environ.get():
# ANTHROPIC_API_KEY, AWS_ROLE_ARN, AWS_REGION,
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN,
# MISTRAL_API_KEY, TOGETHERAI_API_KEY
GenAI evaluation worker configuration from `mlflow/environment_variables.py`:
MLFLOW_GENAI_EVAL_MAX_WORKERS = _EnvironmentVariable(
"MLFLOW_GENAI_EVAL_MAX_WORKERS", int, 10
)
MLFLOW_GENAI_EVAL_MAX_SCORER_WORKERS = _EnvironmentVariable(
"MLFLOW_GENAI_EVAL_MAX_SCORER_WORKERS", int, 10
)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ImportError: openai` | OpenAI SDK not installed | `pip install openai` |
| `AuthenticationError: Incorrect API key` | Invalid or missing API key | Set `OPENAI_API_KEY` environment variable |
| `ImportError: langchain` | LangChain not installed | `pip install langchain>=0.3.19` |
| `RateLimitError` | API rate limit exceeded | Reduce `MLFLOW_GENAI_EVAL_MAX_WORKERS` to lower concurrency |
Compatibility Notes
- OpenAI: Supports chat completions, embeddings, and structured outputs autologging. Also supports the OpenAI Agents SDK and Responses API.
- LangChain: Version range restricted to 0.3.19-1.2.9. Supports chains, retrievers, agents, and chat models.
- Databricks: Can use Databricks-hosted model endpoints via `MLFLOW_DEPLOYMENTS_TARGET` and Foundation Model APIs.
- LLM-as-Judge: GenAI evaluation scorers use litellm under the hood to support multiple LLM providers through a unified interface.