Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mit han lab Llm awq LLaVA Image Processing

From Leeroopedia
Knowledge Sources
Domains Vision, Preprocessing
Last Updated 2026-02-15 00:00 GMT

Overview

Image loading and preprocessing utilities for LLaVA models, supporting URL/base64/file sources, square padding, and model-specific image processing pipelines.

Description

This module, modified from the original LLaVA codebase by Haotian Liu (Apache 2.0 licensed), provides a set of utility functions for loading and preparing images for LLaVA and VILA vision-language models.

load_image_from_base64 decodes a base64-encoded string and returns a PIL Image object, useful for handling images transmitted as encoded strings in API contexts.

load_image loads a single image from either a URL (http/https) or a local file path. For URLs, it uses requests.get to fetch the content and wraps the response bytes in a BytesIO stream. All images are converted to RGB mode to ensure consistent channel format.

load_images is a batch wrapper that calls load_image for each file in a list, returning a list of PIL Image objects.

vis_images provides terminal-based image visualization using the termvisage command-line tool. For a single image, it displays it directly at 12 lines height aligned left. For multiple images, it uses ImageMagick's convert command to resize all images to 500px height, splice a 100px gap between them, append them horizontally into a temporary file (.vis.jpg), and then display the composite.

expand2square pads a non-square PIL image to a square by adding background-colored borders. If the image is wider than tall, it creates a new square canvas matching the width and pastes the image centered vertically; if taller than wide, it centers horizontally. Square images are returned unchanged. The background color is typically derived from the image processor's mean pixel values.

process_images is the main preprocessing pipeline that integrates with HuggingFace image processors. When the model configuration specifies image_aspect_ratio == "pad", it applies expand2square using the processor's image_mean as background, then preprocesses with the image processor. Otherwise, it applies the processor directly. There is special-case handling for InternViT-based processors (detected by class name), which require an extra unsqueeze(0) dimension. The function returns either a stacked tensor (if all processed images have the same shape) or a list of tensors.

Usage

Use these utilities in any LLaVA or VILA demo script that requires image input. The functions are imported and used in vila10_demo.py and vila15_demo.py for loading and preprocessing images before passing them to the model. They are also imported in nvila_demo.py and internvl_demo.py for terminal-based image visualization.

Code Reference

Source Location

Signature

def load_image_from_base64(image):

def load_image(image_file):

def load_images(image_files):

def vis_images(image_files):

def expand2square(pil_img, background_color):

def process_images(images, image_processor, model_cfg):

Import

from tinychat.utils.llava_image_processing import (
    load_image_from_base64,
    load_image,
    load_images,
    vis_images,
    expand2square,
    process_images,
)

I/O Contract

load_image_from_base64

Parameter Type Description
image str Base64-encoded image string
Returns Type Description
image PIL.Image Decoded PIL Image object

load_image

Parameter Type Description
image_file str URL (http/https) or local file path
Returns Type Description
image PIL.Image Loaded RGB PIL Image object

load_images

Parameter Type Description
image_files list[str] List of URLs or file paths
Returns Type Description
images list[PIL.Image] List of loaded RGB PIL Image objects

expand2square

Parameter Type Description
pil_img PIL.Image Input image of any aspect ratio
background_color tuple RGB background color for padding
Returns Type Description
image PIL.Image Square-padded image with centered content

process_images

Parameter Type Description
images list[PIL.Image] List of PIL images to process
image_processor PreTrainedImageProcessor HuggingFace image processor
model_cfg PretrainedConfig Model config with image_aspect_ratio attribute
Returns Type Description
tensors torch.Tensor or list[torch.Tensor] Preprocessed image tensor(s) ready for model input

Usage Examples

from tinychat.utils.llava_image_processing import (
    load_images,
    process_images,
    vis_images,
    expand2square,
)

# Load images from mixed sources
image_files = [
    "https://example.com/photo.jpg",
    "/path/to/local/image.png",
]
images = load_images(image_files)

# Visualize in terminal
vis_images(image_files)

# Preprocess for model input
image_tensor = process_images(images, image_processor, model.config)
image_tensor = image_tensor.to("cuda:0", dtype=torch.float16)

# Manual square padding
padded = expand2square(images[0], (128, 128, 128))

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment