Environment:Infiniflow Ragflow GPU CUDA Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Computer_Vision, Deep_Learning |
| Last Updated | 2026-02-12 06:00 GMT |
Overview
Optional NVIDIA CUDA GPU environment for accelerating OCR inference, layout recognition, and embedding generation via ONNX Runtime CUDA provider.
Description
This environment enables GPU-accelerated inference for RAGFlow's document processing pipeline. When an NVIDIA GPU with CUDA support is detected, ONNX Runtime switches from `CPUExecutionProvider` to `CUDAExecutionProvider` for OCR text recognition, layout analysis, and table structure recognition. GPU memory is capped at 2GB by default (configurable) with arena-based allocation. The system gracefully falls back to CPU when no GPU is available.
Usage
Use this environment when GPU-accelerated document processing is needed, particularly for high-volume OCR and layout recognition workloads. Enable by setting `DEVICE=gpu` in `docker/.env` and using the `ragflow-gpu` Docker Compose profile. The GPU is optional — all features work on CPU, just slower.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu 20.04+) | Docker GPU passthrough requires Linux |
| Hardware | NVIDIA GPU with CUDA support | Any CUDA-capable GPU; 2GB+ VRAM recommended |
| Software | NVIDIA Driver + CUDA Toolkit | Compatible with ONNX Runtime 1.23.2 |
| Docker | nvidia-container-toolkit | Required for Docker GPU passthrough |
Dependencies
System Packages
- NVIDIA GPU driver (host)
- `nvidia-container-toolkit` (Docker GPU support)
Python Packages
- `onnxruntime-gpu` == 1.23.2 (Linux x86_64 only)
- `torch` (with CUDA support, for detection)
Credentials
No additional credentials required beyond the base Python runtime environment.
Quick Install
# Enable GPU in Docker deployment
# In docker/.env:
DEVICE=gpu
# Start with GPU profile
docker compose --profile gpu -f docker-compose.yml up -d
# Configure OCR GPU memory (optional)
export OCR_GPU_MEM_LIMIT_MB=4096 # Default: 2048 (2GB)
export OCR_INTRA_OP_NUM_THREADS=4 # Default: 2
export OCR_INTER_OP_NUM_THREADS=4 # Default: 2
Code Evidence
CUDA detection and ONNX provider selection from `deepdoc/vision/ocr.py:85-133`:
def cuda_is_available():
try:
pip_install_torch()
import torch
target_id = 0 if device_id is None else device_id
if torch.cuda.is_available() and torch.cuda.device_count() > target_id:
return True
except Exception:
return False
return False
if cuda_is_available():
gpu_mem_limit_mb = int(os.environ.get("OCR_GPU_MEM_LIMIT_MB", "2048"))
cuda_provider_options = {
"device_id": provider_device_id,
"gpu_mem_limit": max(gpu_mem_limit_mb, 0) * 1024 * 1024,
"arena_extend_strategy": arena_strategy,
}
sess = ort.InferenceSession(model_file_path, options=options,
providers=['CUDAExecutionProvider'],
provider_options=[cuda_provider_options])
else:
sess = ort.InferenceSession(model_file_path, options=options,
providers=['CPUExecutionProvider'])
GPU device count check from `common/settings.py:356-364`:
def check_and_install_torch():
import torch.cuda
PARALLEL_DEVICES = torch.cuda.device_count()
Docker GPU passthrough from `docker/docker-compose.yml:103-106`:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `CUDAExecutionProvider not available` | CUDA toolkit not installed or incompatible | Install matching CUDA version for ONNX Runtime 1.23.2 |
| `CUDA out of memory` | GPU VRAM exhausted | Reduce `OCR_GPU_MEM_LIMIT_MB` or set `OCR_GPUMEM_ARENA_SHRINKAGE=1` |
| `torch.cuda.is_available() returns False` | No NVIDIA driver or GPU not detected | Install NVIDIA drivers; check `nvidia-smi` |
| GPU not passed through in Docker | Missing nvidia-container-toolkit | Install `nvidia-container-toolkit` and restart Docker daemon |
Compatibility Notes
- CPU fallback: All GPU-accelerated features gracefully fall back to CPU when no GPU is detected. No code changes needed.
- macOS: GPU acceleration not supported. Uses CPU-only `onnxruntime`.
- Multi-GPU: Supported via `device_id` parameter and `CUDA_VISIBLE_DEVICES` environment variable. OCR uses `asyncio.Semaphore` for concurrent GPU access.
- VRAM management: Set `OCR_GPUMEM_ARENA_SHRINKAGE=1` to release VRAM back to system after each inference run (reduces peak usage but may slow down sequential operations).
- Ascend NPU: Alternative to CUDA via `LAYOUT_RECOGNIZER_TYPE=ascend` and `ASCEND_LAYOUT_RECOGNIZER_DEVICE_ID`.