Overview
Image loading and preprocessing utilities for LLaVA models, supporting URL/base64/file sources, square padding, and model-specific image processing pipelines.
Description
This module, modified from the original LLaVA codebase by Haotian Liu (Apache 2.0 licensed), provides a set of utility functions for loading and preparing images for LLaVA and VILA vision-language models.
load_image_from_base64 decodes a base64-encoded string and returns a PIL Image object, useful for handling images transmitted as encoded strings in API contexts.
load_image loads a single image from either a URL (http/https) or a local file path. For URLs, it uses requests.get to fetch the content and wraps the response bytes in a BytesIO stream. All images are converted to RGB mode to ensure consistent channel format.
load_images is a batch wrapper that calls load_image for each file in a list, returning a list of PIL Image objects.
vis_images provides terminal-based image visualization using the termvisage command-line tool. For a single image, it displays it directly at 12 lines height aligned left. For multiple images, it uses ImageMagick's convert command to resize all images to 500px height, splice a 100px gap between them, append them horizontally into a temporary file (.vis.jpg), and then display the composite.
expand2square pads a non-square PIL image to a square by adding background-colored borders. If the image is wider than tall, it creates a new square canvas matching the width and pastes the image centered vertically; if taller than wide, it centers horizontally. Square images are returned unchanged. The background color is typically derived from the image processor's mean pixel values.
process_images is the main preprocessing pipeline that integrates with HuggingFace image processors. When the model configuration specifies image_aspect_ratio == "pad", it applies expand2square using the processor's image_mean as background, then preprocesses with the image processor. Otherwise, it applies the processor directly. There is special-case handling for InternViT-based processors (detected by class name), which require an extra unsqueeze(0) dimension. The function returns either a stacked tensor (if all processed images have the same shape) or a list of tensors.
Usage
Use these utilities in any LLaVA or VILA demo script that requires image input. The functions are imported and used in vila10_demo.py and vila15_demo.py for loading and preprocessing images before passing them to the model. They are also imported in nvila_demo.py and internvl_demo.py for terminal-based image visualization.
Code Reference
Source Location
Signature
def load_image_from_base64(image):
def load_image(image_file):
def load_images(image_files):
def vis_images(image_files):
def expand2square(pil_img, background_color):
def process_images(images, image_processor, model_cfg):
Import
from tinychat.utils.llava_image_processing import (
load_image_from_base64,
load_image,
load_images,
vis_images,
expand2square,
process_images,
)
I/O Contract
load_image_from_base64
| Parameter |
Type |
Description
|
| image |
str |
Base64-encoded image string
|
| Returns |
Type |
Description
|
| image |
PIL.Image |
Decoded PIL Image object
|
load_image
| Parameter |
Type |
Description
|
| image_file |
str |
URL (http/https) or local file path
|
| Returns |
Type |
Description
|
| image |
PIL.Image |
Loaded RGB PIL Image object
|
load_images
| Parameter |
Type |
Description
|
| image_files |
list[str] |
List of URLs or file paths
|
| Returns |
Type |
Description
|
| images |
list[PIL.Image] |
List of loaded RGB PIL Image objects
|
expand2square
| Parameter |
Type |
Description
|
| pil_img |
PIL.Image |
Input image of any aspect ratio
|
| background_color |
tuple |
RGB background color for padding
|
| Returns |
Type |
Description
|
| image |
PIL.Image |
Square-padded image with centered content
|
process_images
| Parameter |
Type |
Description
|
| images |
list[PIL.Image] |
List of PIL images to process
|
| image_processor |
PreTrainedImageProcessor |
HuggingFace image processor
|
| model_cfg |
PretrainedConfig |
Model config with image_aspect_ratio attribute
|
| Returns |
Type |
Description
|
| tensors |
torch.Tensor or list[torch.Tensor] |
Preprocessed image tensor(s) ready for model input
|
Usage Examples
from tinychat.utils.llava_image_processing import (
load_images,
process_images,
vis_images,
expand2square,
)
# Load images from mixed sources
image_files = [
"https://example.com/photo.jpg",
"/path/to/local/image.png",
]
images = load_images(image_files)
# Visualize in terminal
vis_images(image_files)
# Preprocess for model input
image_tensor = process_images(images, image_processor, model.config)
image_tensor = image_tensor.to("cuda:0", dtype=torch.float16)
# Manual square padding
padded = expand2square(images[0], (128, 128, 128))
Related Pages