Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Googleapis Python genai LocalTokenizer

From Leeroopedia
Knowledge Sources
Domains Tokenization, NLP
Last Updated 2026-02-15 14:00 GMT

Overview

Concrete tool for local token counting using SentencePiece tokenizers provided by the Google Gen AI SDK.

Description

The LocalTokenizer class provides offline text tokenization without making API calls. It downloads and caches SentencePiece tokenizer models locally and supports both count_tokens (returning a count) and compute_tokens (returning individual token details). This is an experimental feature.

Usage

Import this when you need to count tokens locally for cost estimation, prompt length validation, or offline processing without consuming API quota. Useful for pre-flight checks before sending requests.

Code Reference

Source Location

Signature

class LocalTokenizer:
    def __init__(self, model_name: str) -> None: ...

    def count_tokens(
        self,
        contents: Union[types.ContentListUnion, types.ContentListUnionDict],
        *,
        config: Optional[types.CountTokensConfigOrDict] = None,
    ) -> types.CountTokensResult: ...

    def compute_tokens(
        self,
        contents: Union[types.ContentListUnion, types.ContentListUnionDict],
    ) -> types.ComputeTokensResult: ...

Import

from google.genai.local_tokenizer import LocalTokenizer

I/O Contract

Inputs

Name Type Required Description
model_name str Yes (constructor) Model name for selecting tokenizer (e.g. "gemini-2.0-flash")
contents ContentListUnion or ContentListUnionDict Yes Content to tokenize
config CountTokensConfigOrDict No Configuration including tools and system_instruction

Outputs

Name Type Description
count_tokens() returns CountTokensResult Token count with total_tokens field
compute_tokens() returns ComputeTokensResult Detailed token information with tokens_info

Usage Examples

from google.genai.local_tokenizer import LocalTokenizer

# Initialize with a model name
tokenizer = LocalTokenizer("gemini-2.0-flash")

# Count tokens for a simple string
result = tokenizer.count_tokens("Hello, world!")
print(f"Token count: {result.total_tokens}")

# Count tokens for Content objects
from google.genai import types
content = types.Content(
    role="user",
    parts=[types.Part(text="What is machine learning?")]
)
result = tokenizer.count_tokens([content])
print(f"Token count: {result.total_tokens}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment