Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Bentoml BentoML SDK Validators

From Leeroopedia
Knowledge Sources
Domains SDK, Validation, Type System
Last Updated 2026-02-13 15:00 GMT

Overview

Provides Pydantic-compatible validator classes and schema annotations for rich data types used in BentoML service APIs, including PIL images, file paths, tensors (NumPy, TensorFlow, PyTorch), and Pandas DataFrames.

Description

This module defines custom Pydantic core schema providers and JSON schema generators that enable BentoML to handle non-standard data types in service I/O contracts. Each validator integrates with Pydantic v2's __get_pydantic_core_schema__ and __get_pydantic_json_schema__ protocols. The key classes are:

  1. PILImageEncoder: Handles encoding and decoding of PIL Image objects.
    • decode(): Accepts bytes, BinaryIO, UploadFile, or an existing PIL Image. Parses image format from Content-Type headers.
    • encode(): Serializes a PIL Image to bytes (defaults to PNG format).
    • JSON schema: Reports "type": "file", "format": "image" for validation mode, "type": "string", "format": "binary" for serialization mode.
  1. FileSchema: An attrs-based validator for file/path types with optional content type validation.
    • decode(): Accepts bytes, BinaryIO, UploadFile, PurePath, or str. Writes received data to a temporary file in the request temp directory and returns the Path. Validates content type against the expected pattern using fnmatch.
    • encode(): Reads a Path object to bytes.
    • JSON schema: Reports file type with format and optional content_type constraint.
  1. TensorSchema: An immutable attrs class for tensor validation and serialization, supporting NumPy, TensorFlow, and PyTorch tensors.
    • validate(): Converts input data to the target tensor format, applying dtype and shape constraints.
    • encode(): Serializes tensors to JSON-compatible nested lists (for JSON mode) or NumPy arrays (for Python mode). Handles arrow serialization by flattening arrays.
    • framework_dtype: Property that maps string dtype names to framework-specific dtype objects.
    • dim: Computed property returning the total number of elements from the shape.
    • JSON schema: Reports tensor metadata (format, dtype, shape, dim) for validation mode, and nested array schema for serialization mode.
  1. DataframeSchema: An immutable attrs class for Pandas DataFrame validation.
    • validate(): Converts input (dict or list of dicts) to a DataFrame with optional column specification.
    • encode(): Serializes to records (list of dicts) or columns (dict of lists) orientation.
    • JSON schema: Reports dataframe metadata for validation mode, and array/object schema matching the orientation for serialization mode.
  1. Metadata annotations: Three simple frozen attrs classes extending annotated_types.BaseMetadata:
    • ContentType(content_type: str): Annotates a type with its expected MIME type.
    • Shape(dimensions: tuple[int, ...]): Annotates tensor shape constraints.
    • DType(dtype: str): Annotates tensor dtype constraints.
  1. arrow_serialization(): A context manager that sets a global flag (__in_arrow_serialization__) to signal that tensor serialization should flatten arrays for Apache Arrow compatibility.

Usage

These validators are used as Pydantic Annotated type annotations in BentoML service method signatures. They are consumed by the IODescriptor system to generate correct schemas and handle serialization/deserialization of complex data types in HTTP requests and responses.

Code Reference

Source Location

Signature

class PILImageEncoder:
    def decode(self, obj: bytes | t.BinaryIO | UploadFile | PILImage.Image) -> t.Any: ...
    def encode(self, obj: PILImage.Image) -> bytes: ...

@attrs.define
class FileSchema:
    format: str = "binary"
    content_type: str | None = None
    def decode(self, obj: bytes | t.BinaryIO | UploadFile | PurePath | str) -> t.Any: ...
    def encode(self, obj: Path) -> bytes: ...

@attrs.frozen(unsafe_hash=True)
class TensorSchema:
    format: TensorFormat
    dtype: t.Optional[str] = None
    shape: t.Optional[t.Tuple[int, ...]] = None
    def validate(self, obj: t.Any) -> t.Any: ...
    def encode(self, arr: TensorType, info: core_schema.SerializationInfo) -> t.Any: ...

@attrs.frozen(unsafe_hash=True)
class DataframeSchema:
    orient: str = "records"
    columns: tuple[str] | None = None
    def validate(self, obj: t.Any) -> pd.DataFrame: ...
    def encode(self, df: pd.DataFrame, info: core_schema.SerializationInfo) -> t.Any: ...

@attrs.frozen
class ContentType(BaseMetadata):
    content_type: str

@attrs.frozen
class Shape(BaseMetadata):
    dimensions: tuple[int, ...]

@attrs.frozen
class DType(BaseMetadata):
    dtype: str

Import

from _bentoml_sdk.validators import ContentType
from _bentoml_sdk.validators import TensorSchema
from _bentoml_sdk.validators import DataframeSchema
from _bentoml_sdk.validators import FileSchema
from _bentoml_sdk.validators import PILImageEncoder
from _bentoml_sdk.validators import Shape, DType

I/O Contract

Inputs

Name Type Required Description
obj (PILImageEncoder) bytes / BinaryIO / UploadFile / PILImage.Image Yes Raw image data to decode into a PIL Image
obj (FileSchema) bytes / BinaryIO / UploadFile / PurePath / str Yes File data to decode into a filesystem Path
obj (TensorSchema) list / ndarray / Tensor Yes Numeric data to validate and convert to a specific tensor type
obj (DataframeSchema) dict / list[dict] / DataFrame Yes Tabular data to validate and convert to a Pandas DataFrame

Outputs

Name Type Description
PILImage.Image PIL Image Decoded image object
Path pathlib.Path Path to a temporary file containing the decoded file data
TensorType ndarray / tf.Tensor / torch.Tensor Validated tensor in the target framework format
pd.DataFrame Pandas DataFrame Validated DataFrame with optional column constraints

Usage Examples

import typing as t
from typing import Annotated
import numpy as np
from _bentoml_sdk.validators import TensorSchema, ContentType, DataframeSchema
from pathlib import Path

import bentoml

@bentoml.service
class ImageService:
    @bentoml.api
    def predict(
        self,
        tensor: Annotated[np.ndarray, TensorSchema(format="numpy-array", dtype="float32", shape=(1, 224, 224, 3))],
    ) -> np.ndarray:
        return self.model.predict(tensor)

    @bentoml.api
    def classify_image(
        self,
        image: Annotated[Path, ContentType("image/png")],
    ) -> dict:
        return {"class": "cat"}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment