Principle:Mlc ai Web llm Cosine Similarity Search
Overview
Cosine Similarity Search is the technique for finding semantically similar texts by computing cosine similarity (or equivalently, dot product for L2-normalized vectors) between their embedding vectors. This is the core retrieval mechanism that connects embedding generation to practical applications such as semantic search and Retrieval-Augmented Generation (RAG).
Description
After text has been converted into dense vector representations via an embedding model, the task of finding "similar" texts becomes a geometric problem in vector space. Cosine similarity measures the angular closeness between two vectors, producing a score that quantifies semantic relatedness.
The Retrieval Workflow
- Index Phase: Generate embeddings for all documents in the corpus and store them in a vector store (in-memory or persistent)
- Query Phase: When a query arrives, generate its embedding using the same model (with appropriate query formatting)
- Search Phase: Compute similarity scores between the query embedding and all document embeddings
- Ranking Phase: Sort documents by descending similarity score and return the top-k most relevant results
In-Browser Vector Stores
Since web-llm runs entirely in the browser, vector storage must also be client-side. Two approaches are common:
- Manual Computation
- Store embedding vectors in a plain JavaScript array and compute dot products directly. This is simple and sufficient for small corpora (hundreds to low thousands of documents).
- LangChain MemoryVectorStore
- The
MemoryVectorStorefromlangchain/vectorstores/memoryprovides a ready-made abstraction that handles document storage, embedding computation, and similarity search. It accepts any implementation of theEmbeddingsInterface, making it straightforward to integrate with web-llm.
Theoretical Basis
Cosine Similarity
For two vectors a and b, cosine similarity is defined as:
cosine_similarity(a, b) = dot(a, b) / (||a||_2 * ||b||_2)
where:
dot(a, b) = sum(a_i * b_i)for i = 1..n||v||_2 = sqrt(sum(v_i^2))is the L2 (Euclidean) norm
The result is in the range [-1, 1]:
1.0-- vectors point in the same direction (maximally similar)0.0-- vectors are orthogonal (unrelated)-1.0-- vectors point in opposite directions (maximally dissimilar)
Dot Product Equivalence for Normalized Vectors
When embedding vectors are L2-normalized (as produced by Snowflake Arctic Embed), ||a||_2 = ||b||_2 = 1, so:
cosine_similarity(a_norm, b_norm) = dot(a_norm, b_norm)
This simplification eliminates the need to compute norms at query time, reducing the per-comparison cost from O(3n) to O(n) where n is the embedding dimension.
Top-k Retrieval
Given a query vector q and a corpus of N document vectors D = {d_1, d_2, ..., d_N}:
scores = [dot(q, d_i) for d_i in D] top_k_indices = argsort(scores, descending=True)[:k] relevant_docs = [D[i] for i in top_k_indices]
For small corpora (up to several thousand documents), a brute-force linear scan is sufficient. For larger corpora, approximate nearest neighbor (ANN) algorithms such as HNSW or IVF can be used, though these are typically not needed for in-browser applications.
Similarity vs. Distance
Some vector stores use distance rather than similarity. Common distance metrics for normalized vectors:
| Metric | Formula | Relationship to Cosine Similarity | ||
|---|---|---|---|---|
| Cosine distance | 1 - cosine_similarity(a, b) |
Lower distance = more similar | ||
| Euclidean distance | |
a - b | _2 | For normalized vectors: sqrt(2 * (1 - dot(a, b)))
|
LangChain's MemoryVectorStore uses cosine similarity internally and returns results sorted by descending similarity.
I/O Contract
Input:
- A query embedding vector:
number[]of lengthhidden_size - A set of document embedding vectors:
number[][]where each inner array has lengthhidden_size - A parameter
kspecifying how many results to return
Output:
- The top-k documents ranked by descending similarity score
- Optionally, the similarity scores themselves
Constraints:
- All vectors must have the same dimensionality
- Query and document embeddings should be generated by the same model with appropriate formatting
- For L2-normalized vectors, dot product equals cosine similarity
Usage Examples
Manual Cosine Similarity Computation
import { CreateMLCEngine } from "@mlc-ai/web-llm";
const engine = await CreateMLCEngine("snowflake-arctic-embed-m-q0f32-MLC-b4");
// Helper: compute dot product (= cosine similarity for normalized vectors)
function dotProduct(a: number[], b: number[]): number {
let sum = 0;
for (let i = 0; i < a.length; i++) {
sum += a[i] * b[i];
}
return sum;
}
// Helper: top-k retrieval
function topK(
queryVec: number[],
docVecs: number[][],
docTexts: string[],
k: number,
): Array<{ text: string; score: number }> {
const scored = docVecs.map((vec, i) => ({
text: docTexts[i],
score: dotProduct(queryVec, vec),
}));
scored.sort((a, b) => b.score - a.score);
return scored.slice(0, k);
}
// Embed documents
const QUERY_PREFIX =
"Represent this sentence for searching relevant passages: ";
const documents = [
"WebGPU is a modern graphics and compute API for the web.",
"TypeScript adds static typing to JavaScript.",
"Machine learning models can run in the browser with WebGPU.",
"React is a popular frontend framework.",
];
const formattedDocs = documents.map((d) => `[CLS] ${d} [SEP]`);
const docResult = await engine.embeddings.create({ input: formattedDocs });
const docVecs = docResult.data.map((d) => d.embedding);
// Embed query
const query = "How can I run ML in a web browser?";
const formattedQuery = `[CLS] ${QUERY_PREFIX}${query} [SEP]`;
const queryResult = await engine.embeddings.create({ input: formattedQuery });
const queryVec = queryResult.data[0].embedding;
// Find top-2 most similar documents
const results = topK(queryVec, docVecs, documents, 2);
for (const r of results) {
console.log(`Score: ${r.score.toFixed(4)} | ${r.text}`);
}
// Expected output (approximate):
// Score: 0.8521 | Machine learning models can run in the browser with WebGPU.
// Score: 0.7134 | WebGPU is a modern graphics and compute API for the web.
Using LangChain MemoryVectorStore
import * as webllm from "@mlc-ai/web-llm";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import type { EmbeddingsInterface } from "@langchain/core/embeddings";
// Bridge web-llm to LangChain's EmbeddingsInterface
class WebLLMEmbeddings implements EmbeddingsInterface {
engine: webllm.MLCEngineInterface;
modelId: string;
constructor(engine: webllm.MLCEngineInterface, modelId: string) {
this.engine = engine;
this.modelId = modelId;
}
async embedQuery(document: string): Promise<number[]> {
const reply = await this.engine.embeddings.create({
input: [document],
model: this.modelId,
});
return reply.data[0].embedding;
}
async embedDocuments(documents: string[]): Promise<number[][]> {
const reply = await this.engine.embeddings.create({
input: documents,
model: this.modelId,
});
return reply.data.map((d) => d.embedding);
}
}
// Setup
const modelId = "snowflake-arctic-embed-m-q0f32-MLC-b4";
const engine = await webllm.CreateMLCEngine(modelId);
// Create vector store with embedded documents
const vectorStore = await MemoryVectorStore.fromTexts(
[
"[CLS] The mitochondria is the powerhouse of the cell. [SEP]",
"[CLS] Photosynthesis occurs in chloroplasts. [SEP]",
"[CLS] DNA replication is semi-conservative. [SEP]",
],
[{ id: 1 }, { id: 2 }, { id: 3 }],
new WebLLMEmbeddings(engine, modelId),
);
// Similarity search returns documents sorted by cosine similarity
const prefix =
"Represent this sentence for searching relevant passages: ";
const results = await vectorStore.similaritySearch(
`[CLS] ${prefix}How do cells get energy? [SEP]`,
2,
);
for (const doc of results) {
console.log(`* ${doc.pageContent}`);
}
// Expected: "[CLS] The mitochondria is the powerhouse of the cell. [SEP]"
Computing Raw Similarity Scores
import * as webllm from "@mlc-ai/web-llm";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
const modelId = "snowflake-arctic-embed-m-q0f32-MLC-b4";
const engine = await webllm.CreateMLCEngine(modelId);
// Direct similarity computation using MemoryVectorStore's similarity method
const vectorStore = await MemoryVectorStore.fromExistingIndex(
new WebLLMEmbeddings(engine, modelId),
);
const queryReply = await engine.embeddings.create({
input: "[CLS] Represent this sentence for searching relevant passages: what is snowflake? [SEP]",
});
const docReply = await engine.embeddings.create({
input: [
"[CLS] The Data Cloud! [SEP]",
"[CLS] Mexico City of Course! [SEP]",
],
});
// Use vectorStore.similarity() for pairwise scores
for (let j = 0; j < docReply.data.length; j++) {
const score = vectorStore.similarity(
queryReply.data[0].embedding,
docReply.data[j].embedding,
);
console.log(`Document ${j}: similarity = ${score.toFixed(4)}`);
}
Related Pages
- Implementation:Mlc_ai_Web_llm_Cosine_Similarity_Vector_Store -- Implementation:Mlc_ai_Web_llm_Cosine_Similarity_Vector_Store
- Principle:Mlc_ai_Web_llm_Text_Embedding_Generation -- generating the vectors used for similarity search
- Principle:Mlc_ai_Web_llm_Embedding_Input_Formatting -- formatting inputs correctly before embedding
- Principle:Mlc_ai_Web_llm_RAG_Pipeline -- using similarity search as the retrieval step in RAG