Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:SeldonIO Seldon core HuggingFace Text Inference

From Leeroopedia
Field Value
Overview Sending text-based inference requests to HuggingFace models using the V2 protocol BYTES datatype.
Domains NLP, Inference
Related Implementation SeldonIO_Seldon_core_Seldon_Model_Infer_BYTES
Knowledge Sources Repo (https://github.com/SeldonIO/seldon-core), Doc (https://docs.seldon.io/projects/seldon-core/en/v2/)
Last Updated 2026-02-13 00:00 GMT

Description

HuggingFace models accept text input via the V2 BYTES datatype. Text strings are passed directly in the data array. REST requests use plain strings; gRPC requests require base64 encoding. Different HuggingFace model types (sentiment, text-gen, whisper) all use the same BYTES input format but produce different outputs (labels+scores, generated text, transcriptions).

The inference request structure follows the V2 Inference Protocol:

  • Input tensor name -- typically "args" for HuggingFace models
  • Shape -- [N] where N is the number of input strings (batch size)
  • Datatype -- always "BYTES" for text input
  • Data -- array of text strings (REST) or base64-encoded strings (gRPC)

The response format varies by model type:

  • Sentiment analysis -- returns labels (e.g., "POSITIVE", "NEGATIVE") and confidence scores
  • Text generation -- returns generated text continuations
  • Speech-to-text (Whisper) -- returns transcribed text from audio input

Theoretical Basis

The BYTES datatype in the V2 protocol handles variable-length opaque data including text strings. This bridges the gap between the tensor-oriented V2 protocol and NLP models that expect raw text input. The MLServer HuggingFace runtime internally tokenizes the text using the model's tokenizer.

The data flow for text inference is:

  1. The client sends raw text in the "data" field with datatype: "BYTES".
  2. The MLServer HuggingFace runtime receives the V2 request and extracts the text strings.
  3. The runtime passes the text through the model's tokenizer to produce input tensors (token IDs, attention masks).
  4. The tokenized input is fed through the model for inference.
  5. The model output (logits, generated tokens) is decoded back to human-readable format.
  6. The response is formatted as a V2 response with BYTES output tensors.

For gRPC transport, the BYTES datatype requires base64 encoding because the Protocol Buffer wire format cannot directly represent arbitrary string content in tensor data fields. The client must encode text to base64 before sending, and decode the base64 response back to text.

Usage

This principle applies when sending text or audio input to deployed HuggingFace models for inference, including:

  • Sending single or batched text strings for sentiment analysis
  • Providing text prompts for auto-regressive text generation
  • Submitting audio data for speech-to-text transcription via Whisper
  • Using either REST or gRPC transport protocols

Related Pages

Implementation:SeldonIO_Seldon_core_Seldon_Model_Infer_BYTES

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment