Environment:Unstructured IO Unstructured OpenAI API
| Knowledge Sources | |
|---|---|
| Domains | Embeddings |
| Last Updated | 2026-02-12 09:00 GMT |
Overview
The OpenAI_API environment provides the dependencies and configuration needed to generate document embeddings using the OpenAI API via the langchain_openai integration.
Description
The OpenAI embedding encoder in unstructured uses the langchain_openai package as its client interface to the OpenAI API. The openai.py module decorates its get_client() method with @requires_dependencies(["langchain_openai"], extras="openai"), enforcing that the correct extra is installed before any API calls are attempted. The import of langchain_openai is performed lazily inside the get_client() method body, meaning the dependency is only loaded at runtime when embedding is actually requested.
The default embedding model is text-embedding-ada-002. The API key is managed through a Pydantic SecretStr field, which is typically populated from the OPENAI_API_KEY environment variable. This approach ensures the key is not accidentally logged or serialized in plain text.
Usage
This environment is required when using the OpenAIEmbeddingEncoder to generate vector embeddings for document elements. This is typically used in retrieval-augmented generation (RAG) pipelines where partitioned document elements need to be embedded for semantic search.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Python | >= 3.11, < 3.14 | Required Python version range |
| OS | Any | No OS-specific requirements; API calls are network-based |
| Network | Internet access required | Must be able to reach the OpenAI API endpoint |
Dependencies
System Packages
- No system packages required beyond Python itself
Python Packages
- langchain_openai -- LangChain wrapper for OpenAI API (installed via the openai extra)
- openai -- underlying OpenAI Python client (transitive dependency of langchain_openai)
- pydantic -- data validation with SecretStr for secure API key handling (transitive dependency)
Credentials
- OPENAI_API_KEY -- OpenAI API key (required; passed as Pydantic SecretStr to prevent accidental exposure in logs)
Quick Install
# Install unstructured with OpenAI extras
pip install "unstructured[openai]"
# Set the API key environment variable
export OPENAI_API_KEY="sk-..."
Code Evidence
Dependency requirement decorator (openai.py):
@requires_dependencies(["langchain_openai"], extras="openai")
def get_client(self):
from langchain_openai import OpenAIEmbeddings
return OpenAIEmbeddings(
model=self.model_name,
openai_api_key=self.api_key,
)
Default model configuration (openai.py):
model_name: str = "text-embedding-ada-002"
api_key: SecretStr
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
ImportError: langchain_openai is required. Install with: pip install "unstructured[openai]" |
The openai extra is not installed | Install via pip install "unstructured[openai]"
|
AuthenticationError: Incorrect API key provided |
Invalid or expired OPENAI_API_KEY | Verify the API key is correct and active in your OpenAI dashboard |
RateLimitError: Rate limit reached |
Too many API requests in a short period | Implement retry logic with exponential backoff, or reduce batch size |
ValidationError: api_key field required |
OPENAI_API_KEY environment variable not set | Export the variable: export OPENAI_API_KEY="sk-..."
|
Compatibility Notes
- The langchain_openai package is used instead of the raw openai package to maintain consistency with the LangChain ecosystem
- The lazy import pattern in get_client() means the dependency is only needed at runtime, not at module import time
- SecretStr from Pydantic ensures the API key is masked in string representations, logs, and serialized output
- The default model text-embedding-ada-002 can be overridden by passing a different model name during encoder initialization