Environment:Infiniflow Ragflow GPU CUDA Environment

Knowledge Sources	RAGFlow deepdoc OCR
Domains	Infrastructure, Computer_Vision, Deep_Learning
Last Updated	2026-02-12 06:00 GMT

Overview

Optional NVIDIA CUDA GPU environment for accelerating OCR inference, layout recognition, and embedding generation via ONNX Runtime CUDA provider.

Description

This environment enables GPU-accelerated inference for RAGFlow's document processing pipeline. When an NVIDIA GPU with CUDA support is detected, ONNX Runtime switches from `CPUExecutionProvider` to `CUDAExecutionProvider` for OCR text recognition, layout analysis, and table structure recognition. GPU memory is capped at 2GB by default (configurable) with arena-based allocation. The system gracefully falls back to CPU when no GPU is available.

Usage

Use this environment when GPU-accelerated document processing is needed, particularly for high-volume OCR and layout recognition workloads. Enable by setting `DEVICE=gpu` in `docker/.env` and using the `ragflow-gpu` Docker Compose profile. The GPU is optional — all features work on CPU, just slower.

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu 20.04+)	Docker GPU passthrough requires Linux
Hardware	NVIDIA GPU with CUDA support	Any CUDA-capable GPU; 2GB+ VRAM recommended
Software	NVIDIA Driver + CUDA Toolkit	Compatible with ONNX Runtime 1.23.2
Docker	nvidia-container-toolkit	Required for Docker GPU passthrough

Dependencies

System Packages

NVIDIA GPU driver (host)
`nvidia-container-toolkit` (Docker GPU support)

Python Packages

`onnxruntime-gpu` == 1.23.2 (Linux x86_64 only)
`torch` (with CUDA support, for detection)

Credentials

No additional credentials required beyond the base Python runtime environment.

Quick Install

# Enable GPU in Docker deployment
# In docker/.env:
DEVICE=gpu

# Start with GPU profile
docker compose --profile gpu -f docker-compose.yml up -d

# Configure OCR GPU memory (optional)
export OCR_GPU_MEM_LIMIT_MB=4096   # Default: 2048 (2GB)
export OCR_INTRA_OP_NUM_THREADS=4  # Default: 2
export OCR_INTER_OP_NUM_THREADS=4  # Default: 2

Code Evidence

CUDA detection and ONNX provider selection from `deepdoc/vision/ocr.py:85-133`:

def cuda_is_available():
    try:
        pip_install_torch()
        import torch
        target_id = 0 if device_id is None else device_id
        if torch.cuda.is_available() and torch.cuda.device_count() > target_id:
            return True
    except Exception:
        return False
    return False

if cuda_is_available():
    gpu_mem_limit_mb = int(os.environ.get("OCR_GPU_MEM_LIMIT_MB", "2048"))
    cuda_provider_options = {
        "device_id": provider_device_id,
        "gpu_mem_limit": max(gpu_mem_limit_mb, 0) * 1024 * 1024,
        "arena_extend_strategy": arena_strategy,
    }
    sess = ort.InferenceSession(model_file_path, options=options,
                                providers=['CUDAExecutionProvider'],
                                provider_options=[cuda_provider_options])
else:
    sess = ort.InferenceSession(model_file_path, options=options,
                                providers=['CPUExecutionProvider'])

GPU device count check from `common/settings.py:356-364`:

def check_and_install_torch():
    import torch.cuda
    PARALLEL_DEVICES = torch.cuda.device_count()

Docker GPU passthrough from `docker/docker-compose.yml:103-106`:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]

Common Errors

Error Message	Cause	Solution
`CUDAExecutionProvider not available`	CUDA toolkit not installed or incompatible	Install matching CUDA version for ONNX Runtime 1.23.2
`CUDA out of memory`	GPU VRAM exhausted	Reduce `OCR_GPU_MEM_LIMIT_MB` or set `OCR_GPUMEM_ARENA_SHRINKAGE=1`
`torch.cuda.is_available() returns False`	No NVIDIA driver or GPU not detected	Install NVIDIA drivers; check `nvidia-smi`
GPU not passed through in Docker	Missing nvidia-container-toolkit	Install `nvidia-container-toolkit` and restart Docker daemon

Compatibility Notes

CPU fallback: All GPU-accelerated features gracefully fall back to CPU when no GPU is detected. No code changes needed.
macOS: GPU acceleration not supported. Uses CPU-only `onnxruntime`.
Multi-GPU: Supported via `device_id` parameter and `CUDA_VISIBLE_DEVICES` environment variable. OCR uses `asyncio.Semaphore` for concurrent GPU access.
VRAM management: Set `OCR_GPUMEM_ARENA_SHRINKAGE=1` to release VRAM back to system after each inference run (reduces peak usage but may slow down sequential operations).
Ascend NPU: Alternative to CUDA via `LAYOUT_RECOGNIZER_TYPE=ascend` and `ASCEND_LAYOUT_RECOGNIZER_DEVICE_ID`.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment