Implementation:Mit han lab Llm awq LLaVA Image Processing

Knowledge Sources	Mit_han_lab_Llm_awq
Domains	Vision, Preprocessing
Last Updated	2026-02-15 00:00 GMT

Overview

Image loading and preprocessing utilities for LLaVA models, supporting URL/base64/file sources, square padding, and model-specific image processing pipelines.

Description

This module, modified from the original LLaVA codebase by Haotian Liu (Apache 2.0 licensed), provides a set of utility functions for loading and preparing images for LLaVA and VILA vision-language models.

load_image_from_base64 decodes a base64-encoded string and returns a PIL Image object, useful for handling images transmitted as encoded strings in API contexts.

load_image loads a single image from either a URL (http/https) or a local file path. For URLs, it uses requests.get to fetch the content and wraps the response bytes in a BytesIO stream. All images are converted to RGB mode to ensure consistent channel format.

load_images is a batch wrapper that calls load_image for each file in a list, returning a list of PIL Image objects.

vis_images provides terminal-based image visualization using the termvisage command-line tool. For a single image, it displays it directly at 12 lines height aligned left. For multiple images, it uses ImageMagick's convert command to resize all images to 500px height, splice a 100px gap between them, append them horizontally into a temporary file (.vis.jpg), and then display the composite.

expand2square pads a non-square PIL image to a square by adding background-colored borders. If the image is wider than tall, it creates a new square canvas matching the width and pastes the image centered vertically; if taller than wide, it centers horizontally. Square images are returned unchanged. The background color is typically derived from the image processor's mean pixel values.

process_images is the main preprocessing pipeline that integrates with HuggingFace image processors. When the model configuration specifies image_aspect_ratio == "pad", it applies expand2square using the processor's image_mean as background, then preprocesses with the image processor. Otherwise, it applies the processor directly. There is special-case handling for InternViT-based processors (detected by class name), which require an extra unsqueeze(0) dimension. The function returns either a stacked tensor (if all processed images have the same shape) or a list of tensors.

Usage

Use these utilities in any LLaVA or VILA demo script that requires image input. The functions are imported and used in vila10_demo.py and vila15_demo.py for loading and preprocessing images before passing them to the model. They are also imported in nvila_demo.py and internvl_demo.py for terminal-based image visualization.

Code Reference

Source Location

Repository: Mit_han_lab_Llm_awq
File: tinychat/utils/llava_image_processing.py
Lines: 1-113

Signature

def load_image_from_base64(image):

def load_image(image_file):

def load_images(image_files):

def vis_images(image_files):

def expand2square(pil_img, background_color):

def process_images(images, image_processor, model_cfg):

Import

from tinychat.utils.llava_image_processing import (
    load_image_from_base64,
    load_image,
    load_images,
    vis_images,
    expand2square,
    process_images,
)

I/O Contract

load_image_from_base64

Parameter	Type	Description
image	str	Base64-encoded image string

Returns	Type	Description
image	PIL.Image	Decoded PIL Image object

load_image

Parameter	Type	Description
image_file	str	URL (http/https) or local file path

Returns	Type	Description
image	PIL.Image	Loaded RGB PIL Image object

load_images

Parameter	Type	Description
image_files	list[str]	List of URLs or file paths

Returns	Type	Description
images	list[PIL.Image]	List of loaded RGB PIL Image objects

expand2square

Parameter	Type	Description
pil_img	PIL.Image	Input image of any aspect ratio
background_color	tuple	RGB background color for padding

Returns	Type	Description
image	PIL.Image	Square-padded image with centered content

process_images

Parameter	Type	Description
images	list[PIL.Image]	List of PIL images to process
image_processor	PreTrainedImageProcessor	HuggingFace image processor
model_cfg	PretrainedConfig	Model config with image_aspect_ratio attribute

Returns	Type	Description
tensors	torch.Tensor or list[torch.Tensor]	Preprocessed image tensor(s) ready for model input

Usage Examples

from tinychat.utils.llava_image_processing import (
    load_images,
    process_images,
    vis_images,
    expand2square,
)

# Load images from mixed sources
image_files = [
    "https://example.com/photo.jpg",
    "/path/to/local/image.png",
]
images = load_images(image_files)

# Visualize in terminal
vis_images(image_files)

# Preprocess for model input
image_tensor = process_images(images, image_processor, model.config)
image_tensor = image_tensor.to("cuda:0", dtype=torch.float16)

# Manual square padding
padded = expand2square(images[0], (128, 128, 128))

Related Pages

Principle:Mit_han_lab_Llm_awq_Dynamic_Image_Video_Preprocessing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment