Implementation:Googleapis Python genai LocalTokenizer
| Knowledge Sources | |
|---|---|
| Domains | Tokenization, NLP |
| Last Updated | 2026-02-15 14:00 GMT |
Overview
Concrete tool for local token counting using SentencePiece tokenizers provided by the Google Gen AI SDK.
Description
The LocalTokenizer class provides offline text tokenization without making API calls. It downloads and caches SentencePiece tokenizer models locally and supports both count_tokens (returning a count) and compute_tokens (returning individual token details). This is an experimental feature.
Usage
Import this when you need to count tokens locally for cost estimation, prompt length validation, or offline processing without consuming API quota. Useful for pre-flight checks before sending requests.
Code Reference
Source Location
- Repository: Googleapis_Python_genai
- File: google/genai/local_tokenizer.py
- Lines: 278-396
Signature
class LocalTokenizer:
def __init__(self, model_name: str) -> None: ...
def count_tokens(
self,
contents: Union[types.ContentListUnion, types.ContentListUnionDict],
*,
config: Optional[types.CountTokensConfigOrDict] = None,
) -> types.CountTokensResult: ...
def compute_tokens(
self,
contents: Union[types.ContentListUnion, types.ContentListUnionDict],
) -> types.ComputeTokensResult: ...
Import
from google.genai.local_tokenizer import LocalTokenizer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name | str | Yes (constructor) | Model name for selecting tokenizer (e.g. "gemini-2.0-flash") |
| contents | ContentListUnion or ContentListUnionDict | Yes | Content to tokenize |
| config | CountTokensConfigOrDict | No | Configuration including tools and system_instruction |
Outputs
| Name | Type | Description |
|---|---|---|
| count_tokens() returns | CountTokensResult | Token count with total_tokens field |
| compute_tokens() returns | ComputeTokensResult | Detailed token information with tokens_info |
Usage Examples
from google.genai.local_tokenizer import LocalTokenizer
# Initialize with a model name
tokenizer = LocalTokenizer("gemini-2.0-flash")
# Count tokens for a simple string
result = tokenizer.count_tokens("Hello, world!")
print(f"Token count: {result.total_tokens}")
# Count tokens for Content objects
from google.genai import types
content = types.Content(
role="user",
parts=[types.Part(text="What is machine learning?")]
)
result = tokenizer.count_tokens([content])
print(f"Token count: {result.total_tokens}")