Environment:Mlfoundations Open flamingo HuggingFace Open CLIP Dependencies

Knowledge Sources	OpenFlamingo HuggingFace Transformers OpenCLIP
Domains	Infrastructure, Deep_Learning, Computer_Vision
Last Updated	2026-02-08 03:30 GMT

Overview

Python 3.9 environment with HuggingFace Transformers >= 4.28.1, OpenCLIP >= 2.16.0, and supporting libraries for the OpenFlamingo model.

Description

This environment provides the core model dependencies for OpenFlamingo. It combines HuggingFace Transformers (for language models like OPT, MPT, Pythia, LLaMA) with OpenCLIP (for vision encoders like ViT-L-14). Additional packages include einops for tensor manipulation, sentencepiece for tokenization, and Pillow for image handling. These are the base requirements needed for model creation, inference, and weight loading.

Usage

Use this environment for Model Creation and Inference workflows. It is the mandatory prerequisite for initializing a Flamingo model via `create_model_and_transforms()`, loading pretrained weights, and running text generation.

System Requirements

Category	Requirement	Notes
Python	3.9	Specified in `environment.yml` and `setup.py` classifiers
Java	OpenJDK	Required via `conda-forge::openjdk` in `environment.yml`

Dependencies

Python Packages

`transformers` >= 4.28.1
`open_clip_torch` >= 2.16.0
`torch` == 2.0.1
`einops`
`einops-exts`
`pillow`
`sentencepiece`

Credentials

No credentials are required for this base environment. However, if using gated models on HuggingFace Hub:

`HF_TOKEN`: HuggingFace API token (if downloading gated models)

Quick Install

# Install from setup.py (recommended)
pip install -e .

# Or install manually
pip install torch==2.0.1 transformers>=4.28.1 open_clip_torch>=2.16.0 einops einops-exts pillow sentencepiece

# With conda (full environment)
conda env create -f environment.yml

Code Evidence

Package requirements from `setup.py:9-17`:

REQUIREMENTS = [
    "einops",
    "einops-exts",
    "transformers>=4.28.1",
    "torch==2.0.1",
    "pillow",
    "open_clip_torch>=2.16.0",
    "sentencepiece",
]

OpenCLIP usage for vision encoder in `open_flamingo/src/factory.py:42-46`:

vision_encoder, _, image_processor = open_clip.create_model_and_transforms(
    clip_vision_encoder_path,
    pretrained=clip_vision_encoder_pretrained,
    cache_dir=cache_dir,
)

HuggingFace Transformers usage for language model in `open_flamingo/src/factory.py:65-70`:

lang_encoder = AutoModelForCausalLM.from_pretrained(
    lang_encoder_path,
    local_files_only=use_local_files,
    trust_remote_code=True,
    cache_dir=cache_dir,
)

Supported LM decoder layer names from `open_flamingo/src/factory.py:132-141`:

__KNOWN_DECODER_LAYERS_ATTR_NAMES = {
    "opt": "model.decoder.layers",
    "gptj": "transformer.h",
    "gpt-j": "transformer.h",
    "pythia": "gpt_neox.layers",
    "llama": "model.layers",
    "gptneoxforcausallm": "gpt_neox.layers",
    "mpt": "transformer.blocks",
    "mosaicgpt": "transformer.blocks",
}

Common Errors

Error Message	Cause	Solution
`ValueError: We require the attribute name for the nn.ModuleList`	Unsupported language model architecture	Pass `--decoder_layers_attr_name` manually
`trust_remote_code=True` warning	Using custom model architectures (MPT)	This is expected; flag is set automatically in factory.py
MPT missing `get_input_embeddings`	MPT-1B model lacks standard HF method	Handled by `EmbeddingFnMixin` hack in factory.py:73-82

Compatibility Notes

Supported LMs: OPT, GPT-J, Pythia, LLaMA, MPT, MosaicGPT. Custom models require specifying `decoder_layers_attr_name`.
MPT-1B: Requires a runtime monkey-patch to add `get_input_embeddings` / `set_input_embeddings` methods (factory.py:73-82).
Offline mode: Set `--offline` to use `local_files_only=True` for both HuggingFace and Transformers.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment