Implementation:Bentoml BentoML SDK Validators
| Knowledge Sources | |
|---|---|
| Domains | SDK, Validation, Type System |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
Provides Pydantic-compatible validator classes and schema annotations for rich data types used in BentoML service APIs, including PIL images, file paths, tensors (NumPy, TensorFlow, PyTorch), and Pandas DataFrames.
Description
This module defines custom Pydantic core schema providers and JSON schema generators that enable BentoML to handle non-standard data types in service I/O contracts. Each validator integrates with Pydantic v2's __get_pydantic_core_schema__ and __get_pydantic_json_schema__ protocols. The key classes are:
PILImageEncoder: Handles encoding and decoding of PIL Image objects.decode(): Acceptsbytes,BinaryIO,UploadFile, or an existing PIL Image. Parses image format from Content-Type headers.encode(): Serializes a PIL Image to bytes (defaults to PNG format).- JSON schema: Reports
"type": "file", "format": "image"for validation mode,"type": "string", "format": "binary"for serialization mode.
FileSchema: Anattrs-based validator for file/path types with optional content type validation.decode(): Acceptsbytes,BinaryIO,UploadFile,PurePath, orstr. Writes received data to a temporary file in the request temp directory and returns thePath. Validates content type against the expected pattern usingfnmatch.encode(): Reads aPathobject to bytes.- JSON schema: Reports file type with format and optional content_type constraint.
TensorSchema: An immutableattrsclass for tensor validation and serialization, supporting NumPy, TensorFlow, and PyTorch tensors.validate(): Converts input data to the target tensor format, applying dtype and shape constraints.encode(): Serializes tensors to JSON-compatible nested lists (for JSON mode) or NumPy arrays (for Python mode). Handles arrow serialization by flattening arrays.framework_dtype: Property that maps string dtype names to framework-specific dtype objects.dim: Computed property returning the total number of elements from the shape.- JSON schema: Reports tensor metadata (format, dtype, shape, dim) for validation mode, and nested array schema for serialization mode.
DataframeSchema: An immutableattrsclass for Pandas DataFrame validation.validate(): Converts input (dict or list of dicts) to a DataFrame with optional column specification.encode(): Serializes torecords(list of dicts) orcolumns(dict of lists) orientation.- JSON schema: Reports dataframe metadata for validation mode, and array/object schema matching the orientation for serialization mode.
- Metadata annotations: Three simple frozen
attrsclasses extendingannotated_types.BaseMetadata:ContentType(content_type: str): Annotates a type with its expected MIME type.Shape(dimensions: tuple[int, ...]): Annotates tensor shape constraints.DType(dtype: str): Annotates tensor dtype constraints.
arrow_serialization(): A context manager that sets a global flag (__in_arrow_serialization__) to signal that tensor serialization should flatten arrays for Apache Arrow compatibility.
Usage
These validators are used as Pydantic Annotated type annotations in BentoML service method signatures. They are consumed by the IODescriptor system to generate correct schemas and handle serialization/deserialization of complex data types in HTTP requests and responses.
Code Reference
Source Location
- Repository: Bentoml_BentoML
- File: src/_bentoml_sdk/validators.py
- Lines: 1-361
Signature
class PILImageEncoder:
def decode(self, obj: bytes | t.BinaryIO | UploadFile | PILImage.Image) -> t.Any: ...
def encode(self, obj: PILImage.Image) -> bytes: ...
@attrs.define
class FileSchema:
format: str = "binary"
content_type: str | None = None
def decode(self, obj: bytes | t.BinaryIO | UploadFile | PurePath | str) -> t.Any: ...
def encode(self, obj: Path) -> bytes: ...
@attrs.frozen(unsafe_hash=True)
class TensorSchema:
format: TensorFormat
dtype: t.Optional[str] = None
shape: t.Optional[t.Tuple[int, ...]] = None
def validate(self, obj: t.Any) -> t.Any: ...
def encode(self, arr: TensorType, info: core_schema.SerializationInfo) -> t.Any: ...
@attrs.frozen(unsafe_hash=True)
class DataframeSchema:
orient: str = "records"
columns: tuple[str] | None = None
def validate(self, obj: t.Any) -> pd.DataFrame: ...
def encode(self, df: pd.DataFrame, info: core_schema.SerializationInfo) -> t.Any: ...
@attrs.frozen
class ContentType(BaseMetadata):
content_type: str
@attrs.frozen
class Shape(BaseMetadata):
dimensions: tuple[int, ...]
@attrs.frozen
class DType(BaseMetadata):
dtype: str
Import
from _bentoml_sdk.validators import ContentType
from _bentoml_sdk.validators import TensorSchema
from _bentoml_sdk.validators import DataframeSchema
from _bentoml_sdk.validators import FileSchema
from _bentoml_sdk.validators import PILImageEncoder
from _bentoml_sdk.validators import Shape, DType
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| obj (PILImageEncoder) | bytes / BinaryIO / UploadFile / PILImage.Image | Yes | Raw image data to decode into a PIL Image |
| obj (FileSchema) | bytes / BinaryIO / UploadFile / PurePath / str | Yes | File data to decode into a filesystem Path |
| obj (TensorSchema) | list / ndarray / Tensor | Yes | Numeric data to validate and convert to a specific tensor type |
| obj (DataframeSchema) | dict / list[dict] / DataFrame | Yes | Tabular data to validate and convert to a Pandas DataFrame |
Outputs
| Name | Type | Description |
|---|---|---|
| PILImage.Image | PIL Image | Decoded image object |
| Path | pathlib.Path | Path to a temporary file containing the decoded file data |
| TensorType | ndarray / tf.Tensor / torch.Tensor | Validated tensor in the target framework format |
| pd.DataFrame | Pandas DataFrame | Validated DataFrame with optional column constraints |
Usage Examples
import typing as t
from typing import Annotated
import numpy as np
from _bentoml_sdk.validators import TensorSchema, ContentType, DataframeSchema
from pathlib import Path
import bentoml
@bentoml.service
class ImageService:
@bentoml.api
def predict(
self,
tensor: Annotated[np.ndarray, TensorSchema(format="numpy-array", dtype="float32", shape=(1, 224, 224, 3))],
) -> np.ndarray:
return self.model.predict(tensor)
@bentoml.api
def classify_image(
self,
image: Annotated[Path, ContentType("image/png")],
) -> dict:
return {"class": "cat"}