Implementation:Openai Openai python Embeddings Create
| Knowledge Sources | |
|---|---|
| Domains | NLP, Embeddings, Semantic_Search |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for generating text embedding vectors with configurable dimensions provided by the OpenAI Python SDK.
Description
The Embeddings resource provides a create() method that generates dense vector representations of text. It supports batch embedding (multiple texts in one call), configurable output dimensions (text-embedding-3-* models), and optional base64 encoding. When base64 encoding is requested and numpy is available, the SDK automatically decodes vectors to numpy arrays.
Usage
Call client.embeddings.create() with text input and a model selection. Access vectors via response.data[i].embedding.
Code Reference
Source Location
- Repository: openai-python
- File: src/openai/resources/embeddings.py
- Lines: L1-298
Signature
class Embeddings(SyncAPIResource):
def create(
self,
*,
input: Union[str, List[str], Iterable[int], Iterable[Iterable[int]]],
model: Union[str, EmbeddingModel],
dimensions: int | NotGiven = NOT_GIVEN,
encoding_format: Literal["float", "base64"] | NotGiven = NOT_GIVEN,
user: str | NotGiven = NOT_GIVEN,
) -> CreateEmbeddingResponse:
"""
Creates an embedding vector representing the input text.
Args:
input: Text to embed (string, list of strings, or token arrays).
model: Model ID (text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002).
dimensions: Output dimensions (text-embedding-3-* only, truncates output).
encoding_format: "float" (default) or "base64".
user: End-user identifier for abuse monitoring.
"""
Import
from openai import OpenAI
# Access via client.embeddings.create()
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input | str, list[str], Iterable[int], Iterable[Iterable[int]] | Yes | Text or token input to embed |
| model | Union[str, EmbeddingModel] | Yes | text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002 |
| dimensions | int | No | Output vector dimensions (text-embedding-3-* only) |
| encoding_format | str | No | "float" or "base64" (default "float") |
Outputs
| Name | Type | Description |
|---|---|---|
| response.data | list[Embedding] | List of embedding objects |
| response.data[i].embedding | list[float] | Float vector |
| response.data[i].index | int | Position matching input order |
| response.model | str | Model used |
| response.usage | Usage | Token usage (prompt_tokens, total_tokens) |
Usage Examples
Single Embedding
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
input="The food was delicious and the service was great.",
model="text-embedding-3-small",
)
embedding = response.data[0].embedding
print(f"Vector dimension: {len(embedding)}")
print(f"Tokens used: {response.usage.total_tokens}")
Batch Embeddings
texts = [
"Machine learning is a subset of AI.",
"Deep learning uses neural networks.",
"Natural language processing handles text.",
]
response = client.embeddings.create(
input=texts,
model="text-embedding-3-small",
)
for item in response.data:
print(f"Text {item.index}: vector[0:3] = {item.embedding[:3]}")
Cosine Similarity
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
response = client.embeddings.create(
input=["cat", "dog", "car"],
model="text-embedding-3-small",
)
vectors = [item.embedding for item in response.data]
print(f"cat-dog similarity: {cosine_similarity(vectors[0], vectors[1]):.3f}")
print(f"cat-car similarity: {cosine_similarity(vectors[0], vectors[2]):.3f}")
Dimension Reduction
# text-embedding-3-large default: 3072 dimensions
# Reduce to 256 for storage efficiency
response = client.embeddings.create(
input="Some text",
model="text-embedding-3-large",
dimensions=256,
)
print(f"Vector length: {len(response.data[0].embedding)}") # 256