Environment:Ucbepic Docetl LLM API Keys
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, LLM_Pipelines, Security |
| Last Updated | 2026-02-08 01:00 GMT |
Overview
API key and credential environment variables required for LLM providers, Azure Document Intelligence, and AWS Bedrock in DocETL pipelines.
Description
DocETL uses LiteLLM as a unified LLM gateway, supporting 100+ providers through environment variable-based API key configuration. The primary key is `OPENAI_API_KEY`, but pipelines can target any LiteLLM-supported provider by setting the appropriate environment variable. Additional credentials are needed for Azure Document Intelligence (PDF parsing) and AWS Bedrock. All keys are loaded via `python-dotenv` from a `.env` file at startup.
DocETL also supports encrypted API keys stored directly in pipeline YAML configs, decrypted at runtime using `DOCETL_ENCRYPTION_KEY`.
Usage
Use this environment for any pipeline that calls an LLM (map, reduce, filter, resolve, equijoin, rank, extract, topk operations). Azure credentials are only needed for Azure Document Intelligence PDF parsing. AWS credentials are only needed when using Bedrock models.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Network | Internet access | Required for LLM API calls |
| Storage | `.env` file | Place in project root or Docker volume |
Dependencies
System Packages
- `python-dotenv` >= 1.0.1 (loaded via `load_dotenv()` in multiple entry points)
Credentials
Core LLM Access:
- `OPENAI_API_KEY`: API key for OpenAI models. Required for default model (`gpt-4o-mini`). Also used as fallback for other providers via LiteLLM.
Alternative LLM Providers (set one or more):
- `ANTHROPIC_API_KEY`: For Claude models (format: `sk-ant-...`)
- `GEMINI_API_KEY`: For Google Gemini models
- `COHERE_API_KEY`: For Cohere models
Azure Document Intelligence (for PDF parsing):
- `DOCUMENTINTELLIGENCE_API_KEY`: Azure Document Intelligence API key
- `DOCUMENTINTELLIGENCE_ENDPOINT`: Azure Document Intelligence endpoint URL
- Alternative names also supported: `AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT`, `AZURE_DOCUMENT_INTELLIGENCE_KEY`
AWS Bedrock (optional):
- `AWS_PROFILE`: AWS profile name (default: `default`)
- `AWS_REGION`: AWS region (default: `us-west-2`)
Encryption:
- `DOCETL_ENCRYPTION_KEY`: Decryption key for encrypted API keys stored in pipeline YAML config
Ollama (local models):
- `OLLAMA_API_BASE`: Base URL for local Ollama instance (e.g., `http://localhost:11434/`)
Data Storage:
- `DOCETL_HOME_DIR`: Override for cache and data directory (default: `~`)
Quick Install
# Create .env file with your API key
echo "OPENAI_API_KEY=sk-..." > .env
# For Azure Document Intelligence PDF parsing
echo "DOCUMENTINTELLIGENCE_API_KEY=your-key" >> .env
echo "DOCUMENTINTELLIGENCE_ENDPOINT=https://your-instance.cognitiveservices.azure.com/" >> .env
Code Evidence
API key loading via dotenv from `docetl/runner.py:52`:
load_dotenv()
Encrypted key decryption from `docetl/config_wrapper.py:56-63`:
encrypted_llm_api_keys = self.config.get("llm_api_keys", {})
if encrypted_llm_api_keys:
self.llm_api_keys = {
key: decrypt(value, os.environ.get("DOCETL_ENCRYPTION_KEY", ""))
for key, value in encrypted_llm_api_keys.items()
}
Azure Document Intelligence credential check from `docetl/parsing_tools.py:282-290`:
key = os.getenv("DOCUMENTINTELLIGENCE_API_KEY")
endpoint = os.getenv("DOCUMENTINTELLIGENCE_ENDPOINT")
if key is None:
raise ValueError("DOCUMENTINTELLIGENCE_API_KEY environment variable is not set")
if endpoint is None:
raise ValueError("DOCUMENTINTELLIGENCE_ENDPOINT environment variable is not set")
Backend server env vars from `server/app/main.py:10-15`:
host = os.getenv("BACKEND_HOST", "127.0.0.1")
port = int(os.getenv("BACKEND_PORT", 8000))
reload = os.getenv("BACKEND_RELOAD", "False").lower() == "true"
allow_origins = os.getenv("BACKEND_ALLOW_ORIGINS", "http://localhost:3000").split(",")
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `AuthenticationError` from LiteLLM | `OPENAI_API_KEY` not set or invalid | Set valid API key in `.env` file |
| `ValueError: DOCUMENTINTELLIGENCE_API_KEY environment variable is not set` | Azure DI key missing | Set `DOCUMENTINTELLIGENCE_API_KEY` in `.env` |
| `ValueError: DOCUMENTINTELLIGENCE_ENDPOINT environment variable is not set` | Azure DI endpoint missing | Set `DOCUMENTINTELLIGENCE_ENDPOINT` in `.env` |
| `RateLimitError` | API quota exceeded | Wait for quota reset or upgrade API plan |
Compatibility Notes
- LiteLLM Providers: Any provider supported by LiteLLM can be used. Set the appropriate environment variable (e.g., `ANTHROPIC_API_KEY` for Claude, `GEMINI_API_KEY` for Gemini).
- Docker: API keys are passed through `docker-compose.yml` environment section. Never bake keys into Docker images.
- Encrypted Keys: Pipeline YAML configs can store encrypted API keys directly, decrypted at runtime via `DOCETL_ENCRYPTION_KEY`.
- Azure Page Limit: Azure Document Intelligence has a hard limit of 200 pages per PDF (`MAX_AZURE_PAGE_LIMIT = 200` in `server/app/routes/convert.py:34`).
Related Pages
- Implementation:Ucbepic_Docetl_Pipeline_Run
- Implementation:Ucbepic_Docetl_MapOperation_Execute
- Implementation:Ucbepic_Docetl_ReduceOperation_Execute
- Implementation:Ucbepic_Docetl_Pipeline_Optimize
- Implementation:Ucbepic_Docetl_MOARSearch_Search
- Implementation:Ucbepic_Docetl_WebSocket_Run_Pipeline
- Implementation:Ucbepic_Docetl_AI_Chat_And_Prompt_Improvement