Principle:Mlc ai Web llm Cosine Similarity Search

Overview

Cosine Similarity Search is the technique for finding semantically similar texts by computing cosine similarity (or equivalently, dot product for L2-normalized vectors) between their embedding vectors. This is the core retrieval mechanism that connects embedding generation to practical applications such as semantic search and Retrieval-Augmented Generation (RAG).

Description

After text has been converted into dense vector representations via an embedding model, the task of finding "similar" texts becomes a geometric problem in vector space. Cosine similarity measures the angular closeness between two vectors, producing a score that quantifies semantic relatedness.

The Retrieval Workflow

Index Phase: Generate embeddings for all documents in the corpus and store them in a vector store (in-memory or persistent)
Query Phase: When a query arrives, generate its embedding using the same model (with appropriate query formatting)
Search Phase: Compute similarity scores between the query embedding and all document embeddings
Ranking Phase: Sort documents by descending similarity score and return the top-k most relevant results

In-Browser Vector Stores

Since web-llm runs entirely in the browser, vector storage must also be client-side. Two approaches are common:

Manual Computation: Store embedding vectors in a plain JavaScript array and compute dot products directly. This is simple and sufficient for small corpora (hundreds to low thousands of documents).

LangChain MemoryVectorStore: The MemoryVectorStore from langchain/vectorstores/memory provides a ready-made abstraction that handles document storage, embedding computation, and similarity search. It accepts any implementation of the EmbeddingsInterface, making it straightforward to integrate with web-llm.

Theoretical Basis

Cosine Similarity

For two vectors a and b, cosine similarity is defined as:

cosine_similarity(a, b) = dot(a, b) / (||a||_2 * ||b||_2)

where:

dot(a, b) = sum(a_i * b_i) for i = 1..n
||v||_2 = sqrt(sum(v_i^2)) is the L2 (Euclidean) norm

The result is in the range [-1, 1]:

1.0 -- vectors point in the same direction (maximally similar)
0.0 -- vectors are orthogonal (unrelated)
-1.0 -- vectors point in opposite directions (maximally dissimilar)

Dot Product Equivalence for Normalized Vectors

When embedding vectors are L2-normalized (as produced by Snowflake Arctic Embed), ||a||_2 = ||b||_2 = 1, so:

cosine_similarity(a_norm, b_norm) = dot(a_norm, b_norm)

This simplification eliminates the need to compute norms at query time, reducing the per-comparison cost from O(3n) to O(n) where n is the embedding dimension.

Top-k Retrieval

Given a query vector q and a corpus of N document vectors D = {d_1, d_2, ..., d_N}:

scores = [dot(q, d_i) for d_i in D]
top_k_indices = argsort(scores, descending=True)[:k]
relevant_docs = [D[i] for i in top_k_indices]

For small corpora (up to several thousand documents), a brute-force linear scan is sufficient. For larger corpora, approximate nearest neighbor (ANN) algorithms such as HNSW or IVF can be used, though these are typically not needed for in-browser applications.

Similarity vs. Distance

Some vector stores use distance rather than similarity. Common distance metrics for normalized vectors:

Metric	Formula	Relationship to Cosine Similarity
Cosine distance	`1 - cosine_similarity(a, b)`	Lower distance = more similar
Euclidean distance		a - b	_2	For normalized vectors: `sqrt(2 * (1 - dot(a, b)))`

LangChain's MemoryVectorStore uses cosine similarity internally and returns results sorted by descending similarity.

I/O Contract

Input:

A query embedding vector: number[] of length hidden_size
A set of document embedding vectors: number[][] where each inner array has length hidden_size
A parameter k specifying how many results to return

Output:

The top-k documents ranked by descending similarity score
Optionally, the similarity scores themselves

Constraints:

All vectors must have the same dimensionality
Query and document embeddings should be generated by the same model with appropriate formatting
For L2-normalized vectors, dot product equals cosine similarity

Usage Examples

Manual Cosine Similarity Computation

import { CreateMLCEngine } from "@mlc-ai/web-llm";

const engine = await CreateMLCEngine("snowflake-arctic-embed-m-q0f32-MLC-b4");

// Helper: compute dot product (= cosine similarity for normalized vectors)
function dotProduct(a: number[], b: number[]): number {
  let sum = 0;
  for (let i = 0; i < a.length; i++) {
    sum += a[i] * b[i];
  }
  return sum;
}

// Helper: top-k retrieval
function topK(
  queryVec: number[],
  docVecs: number[][],
  docTexts: string[],
  k: number,
): Array<{ text: string; score: number }> {
  const scored = docVecs.map((vec, i) => ({
    text: docTexts[i],
    score: dotProduct(queryVec, vec),
  }));
  scored.sort((a, b) => b.score - a.score);
  return scored.slice(0, k);
}

// Embed documents
const QUERY_PREFIX =
  "Represent this sentence for searching relevant passages: ";
const documents = [
  "WebGPU is a modern graphics and compute API for the web.",
  "TypeScript adds static typing to JavaScript.",
  "Machine learning models can run in the browser with WebGPU.",
  "React is a popular frontend framework.",
];
const formattedDocs = documents.map((d) => `[CLS] ${d} [SEP]`);
const docResult = await engine.embeddings.create({ input: formattedDocs });
const docVecs = docResult.data.map((d) => d.embedding);

// Embed query
const query = "How can I run ML in a web browser?";
const formattedQuery = `[CLS] ${QUERY_PREFIX}${query} [SEP]`;
const queryResult = await engine.embeddings.create({ input: formattedQuery });
const queryVec = queryResult.data[0].embedding;

// Find top-2 most similar documents
const results = topK(queryVec, docVecs, documents, 2);
for (const r of results) {
  console.log(`Score: ${r.score.toFixed(4)} | ${r.text}`);
}
// Expected output (approximate):
// Score: 0.8521 | Machine learning models can run in the browser with WebGPU.
// Score: 0.7134 | WebGPU is a modern graphics and compute API for the web.

Using LangChain MemoryVectorStore

import * as webllm from "@mlc-ai/web-llm";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import type { EmbeddingsInterface } from "@langchain/core/embeddings";

// Bridge web-llm to LangChain's EmbeddingsInterface
class WebLLMEmbeddings implements EmbeddingsInterface {
  engine: webllm.MLCEngineInterface;
  modelId: string;

  constructor(engine: webllm.MLCEngineInterface, modelId: string) {
    this.engine = engine;
    this.modelId = modelId;
  }

  async embedQuery(document: string): Promise<number[]> {
    const reply = await this.engine.embeddings.create({
      input: [document],
      model: this.modelId,
    });
    return reply.data[0].embedding;
  }

  async embedDocuments(documents: string[]): Promise<number[][]> {
    const reply = await this.engine.embeddings.create({
      input: documents,
      model: this.modelId,
    });
    return reply.data.map((d) => d.embedding);
  }
}

// Setup
const modelId = "snowflake-arctic-embed-m-q0f32-MLC-b4";
const engine = await webllm.CreateMLCEngine(modelId);

// Create vector store with embedded documents
const vectorStore = await MemoryVectorStore.fromTexts(
  [
    "[CLS] The mitochondria is the powerhouse of the cell. [SEP]",
    "[CLS] Photosynthesis occurs in chloroplasts. [SEP]",
    "[CLS] DNA replication is semi-conservative. [SEP]",
  ],
  [{ id: 1 }, { id: 2 }, { id: 3 }],
  new WebLLMEmbeddings(engine, modelId),
);

// Similarity search returns documents sorted by cosine similarity
const prefix =
  "Represent this sentence for searching relevant passages: ";
const results = await vectorStore.similaritySearch(
  `[CLS] ${prefix}How do cells get energy? [SEP]`,
  2,
);
for (const doc of results) {
  console.log(`* ${doc.pageContent}`);
}
// Expected: "[CLS] The mitochondria is the powerhouse of the cell. [SEP]"

Computing Raw Similarity Scores

import * as webllm from "@mlc-ai/web-llm";
import { MemoryVectorStore } from "langchain/vectorstores/memory";

const modelId = "snowflake-arctic-embed-m-q0f32-MLC-b4";
const engine = await webllm.CreateMLCEngine(modelId);

// Direct similarity computation using MemoryVectorStore's similarity method
const vectorStore = await MemoryVectorStore.fromExistingIndex(
  new WebLLMEmbeddings(engine, modelId),
);

const queryReply = await engine.embeddings.create({
  input: "[CLS] Represent this sentence for searching relevant passages: what is snowflake? [SEP]",
});
const docReply = await engine.embeddings.create({
  input: [
    "[CLS] The Data Cloud! [SEP]",
    "[CLS] Mexico City of Course! [SEP]",
  ],
});

// Use vectorStore.similarity() for pairwise scores
for (let j = 0; j < docReply.data.length; j++) {
  const score = vectorStore.similarity(
    queryReply.data[0].embedding,
    docReply.data[j].embedding,
  );
  console.log(`Document ${j}: similarity = ${score.toFixed(4)}`);
}

Related Pages

Implementation:Mlc_ai_Web_llm_Cosine_Similarity_Vector_Store -- Implementation:Mlc_ai_Web_llm_Cosine_Similarity_Vector_Store
Principle:Mlc_ai_Web_llm_Text_Embedding_Generation -- generating the vectors used for similarity search
Principle:Mlc_ai_Web_llm_Embedding_Input_Formatting -- formatting inputs correctly before embedding
Principle:Mlc_ai_Web_llm_RAG_Pipeline -- using similarity search as the retrieval step in RAG

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment