Implementation:Mlc ai Web llm Cosine Similarity Vector Store

Overview

Cosine_Similarity_Vector_Store is an External Tool Doc that documents two approaches for computing cosine similarity and performing vector search with web-llm embeddings: manual dot product computation and integration with LangChain's MemoryVectorStore. Neither approach is implemented within web-llm itself; both are user-level patterns demonstrated in the official examples.

Code Reference

Approach 1: Manual Dot Product Computation

From examples/embeddings/src/embeddings.ts at lines 94-108, the example demonstrates computing pairwise similarity using the MemoryVectorStore.similarity() method:

// Calculate similarity (we use langchain here, but any method works)
const vectorStore = await MemoryVectorStore.fromExistingIndex(
  new WebLLMEmbeddings(engine, selectedModel),
);
// See score
for (let i = 0; i < queries_og.length; i++) {
  console.log(`Similarity with: ${queries_og[i]}`);
  for (let j = 0; j < documents_og.length; j++) {
    const similarity = vectorStore.similarity(
      queryReply.data[i].embedding,
      docReply.data[j].embedding,
    );
    console.log(`${documents_og[j]}: ${similarity}`);
  }
}

A pure JavaScript implementation without LangChain:

// Pure dot product (equivalent to cosine similarity for L2-normalized vectors)
function cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
  }
  return dot;
}

Approach 2: LangChain MemoryVectorStore Integration

From examples/embeddings/src/embeddings.ts at lines 112-154:

// LangChain EmbeddingsInterface adapter for web-llm
class WebLLMEmbeddings implements EmbeddingsInterface {
  engine: webllm.MLCEngineInterface;
  modelId: string;
  constructor(engine: webllm.MLCEngineInterface, modelId: string) {
    this.engine = engine;
    this.modelId = modelId;
  }

  async _embed(texts: string[]): Promise<number[][]> {
    const reply = await this.engine.embeddings.create({
      input: texts,
      model: this.modelId,
    });
    const result: number[][] = [];
    for (let i = 0; i < texts.length; i++) {
      result.push(reply.data[i].embedding);
    }
    return result;
  }

  async embedQuery(document: string): Promise<number[]> {
    return this._embed([document]).then((embeddings) => embeddings[0]);
  }

  async embedDocuments(documents: string[]): Promise<number[][]> {
    return this._embed(documents);
  }
}

This adapter bridges web-llm's engine.embeddings.create() to LangChain's EmbeddingsInterface, enabling seamless use with any LangChain vector store.

External Dependencies

Package	Purpose	Required?
`langchain`	`MemoryVectorStore`, `formatDocumentsAsString`	Optional (only for LangChain integration)
`@langchain/core`	`EmbeddingsInterface`, `Document`, `PromptTemplate`, `RunnableSequence`	Optional (only for LangChain integration)

Install with:

npm install langchain @langchain/core

I/O Contract

Manual Dot Product

Input:

Two number[] vectors of equal length (the embedding dimension)

Output:

A single number representing the cosine similarity (range: [-1, 1] for L2-normalized vectors)

MemoryVectorStore.similaritySearch()

Input:

query: string -- the query text (will be embedded via embedQuery())
k: number -- number of results to return

Output:

Document[] -- array of LangChain Document objects sorted by descending similarity

MemoryVectorStore.similarity()

Input:

Two number[] vectors

Output:

number -- cosine similarity score

Usage Examples

Complete Manual Vector Search

import { CreateMLCEngine, CreateEmbeddingResponse } from "@mlc-ai/web-llm";

const engine = await CreateMLCEngine("snowflake-arctic-embed-m-q0f32-MLC-b4");

const QUERY_PREFIX =
  "Represent this sentence for searching relevant passages: ";

// Build a simple in-memory vector index
interface VectorDocument {
  text: string;
  embedding: number[];
}

class SimpleVectorStore {
  private documents: VectorDocument[] = [];

  add(text: string, embedding: number[]): void {
    this.documents.push({ text, embedding });
  }

  search(
    queryEmbedding: number[],
    topK: number,
  ): Array<{ text: string; score: number }> {
    const results = this.documents.map((doc) => ({
      text: doc.text,
      score: this.dotProduct(queryEmbedding, doc.embedding),
    }));
    results.sort((a, b) => b.score - a.score);
    return results.slice(0, topK);
  }

  private dotProduct(a: number[], b: number[]): number {
    let sum = 0;
    for (let i = 0; i < a.length; i++) {
      sum += a[i] * b[i];
    }
    return sum;
  }
}

// Index documents
const store = new SimpleVectorStore();
const docs = [
  "JavaScript is a programming language for the web.",
  "Python is popular for data science and machine learning.",
  "Rust provides memory safety without garbage collection.",
  "WebGPU enables GPU-accelerated computation in browsers.",
];

const docFormatted = docs.map((d) => `[CLS] ${d} [SEP]`);
const docEmbeddings: CreateEmbeddingResponse = await engine.embeddings.create({
  input: docFormatted,
});
for (let i = 0; i < docs.length; i++) {
  store.add(docs[i], docEmbeddings.data[i].embedding);
}

// Search
const query = "GPU computing in web browsers";
const queryFormatted = `[CLS] ${QUERY_PREFIX}${query} [SEP]`;
const queryEmbedding: CreateEmbeddingResponse = await engine.embeddings.create({
  input: queryFormatted,
});

const results = store.search(queryEmbedding.data[0].embedding, 2);
for (const r of results) {
  console.log(`[${r.score.toFixed(4)}] ${r.text}`);
}

LangChain MemoryVectorStore with Document Addition

import * as webllm from "@mlc-ai/web-llm";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import type { EmbeddingsInterface } from "@langchain/core/embeddings";
import type { Document } from "@langchain/core/documents";

class WebLLMEmbeddings implements EmbeddingsInterface {
  engine: webllm.MLCEngineInterface;
  modelId: string;

  constructor(engine: webllm.MLCEngineInterface, modelId: string) {
    this.engine = engine;
    this.modelId = modelId;
  }

  async embedQuery(text: string): Promise<number[]> {
    const reply = await this.engine.embeddings.create({
      input: [text],
      model: this.modelId,
    });
    return reply.data[0].embedding;
  }

  async embedDocuments(texts: string[]): Promise<number[][]> {
    const reply = await this.engine.embeddings.create({
      input: texts,
      model: this.modelId,
    });
    return reply.data.map((d) => d.embedding);
  }
}

const modelId = "snowflake-arctic-embed-m-q0f32-MLC-b4";
const engine = await webllm.CreateMLCEngine(modelId);
const embeddings = new WebLLMEmbeddings(engine, modelId);

// Create store and add documents incrementally
const vectorStore = await MemoryVectorStore.fromExistingIndex(embeddings);

const documents: Document[] = [
  { pageContent: "[CLS] The Data Cloud! [SEP]", metadata: { source: "doc1" } },
  {
    pageContent: "[CLS] Mexico City of Course! [SEP]",
    metadata: { source: "doc2" },
  },
];
await vectorStore.addDocuments(documents);

// Perform similarity search
const prefix =
  "Represent this sentence for searching relevant passages: ";
const searchResults = await vectorStore.similaritySearch(
  `[CLS] ${prefix}what is snowflake? [SEP]`,
  1,
);
console.log("Most similar:", searchResults[0].pageContent);
console.log("Metadata:", searchResults[0].metadata);

Similarity Score Matrix

import { CreateMLCEngine } from "@mlc-ai/web-llm";

const engine = await CreateMLCEngine("snowflake-arctic-embed-m-q0f32-MLC-b4");

// Compute a full similarity matrix between two sets of texts
async function similarityMatrix(
  queriesFormatted: string[],
  docsFormatted: string[],
): Promise<number[][]> {
  const qResult = await engine.embeddings.create({ input: queriesFormatted });
  const dResult = await engine.embeddings.create({ input: docsFormatted });

  const matrix: number[][] = [];
  for (let i = 0; i < qResult.data.length; i++) {
    const row: number[] = [];
    for (let j = 0; j < dResult.data.length; j++) {
      let dot = 0;
      const qVec = qResult.data[i].embedding;
      const dVec = dResult.data[j].embedding;
      for (let k = 0; k < qVec.length; k++) {
        dot += qVec[k] * dVec[k];
      }
      row.push(dot);
    }
    matrix.push(row);
  }
  return matrix;
}

const PREFIX = "Represent this sentence for searching relevant passages: ";
const queries = [
  `[CLS] ${PREFIX}what is snowflake? [SEP]`,
  `[CLS] ${PREFIX}best tacos? [SEP]`,
];
const docs = [
  "[CLS] The Data Cloud! [SEP]",
  "[CLS] Mexico City of Course! [SEP]",
];

const matrix = await similarityMatrix(queries, docs);
console.log("Similarity matrix:");
console.log(matrix);
// Expected: matrix[0][0] > matrix[0][1]  (snowflake -> Data Cloud)
// Expected: matrix[1][1] > matrix[1][0]  (tacos -> Mexico City)

Related Pages

Principle:Mlc_ai_Web_llm_Cosine_Similarity_Search -- Principle:Mlc_ai_Web_llm_Cosine_Similarity_Search
Implementation:Mlc_ai_Web_llm_Embeddings_Create -- generating the embedding vectors
Implementation:Mlc_ai_Web_llm_Embedding_Input_Format -- formatting inputs before embedding
Implementation:Mlc_ai_Web_llm_Multi_Model_RAG_Engine -- complete RAG pipeline using vector search

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment