Workflow:Mlc ai Web llm Function Calling

Knowledge Sources	web-llm WebLLM Docs
Domains	LLMs, WebGPU, Function_Calling, Tool_Use
Last Updated	2026-02-14 22:00 GMT

Overview

End-to-end process for enabling LLM function calling (tool use) in the browser, where the model selects and parameterizes external functions to invoke based on user queries.

Description

This workflow implements the OpenAI-compatible function calling (tool use) pattern in web-llm. The user defines a set of available tools with their function signatures, and the model decides when and how to invoke them based on the conversation context. The model generates structured tool call objects containing the function name and arguments, which the application can then execute. Both streaming and non-streaming modes are supported, with tool calls returned in the final response or final streaming chunk.

Usage

Execute this workflow when building an agentic application where the LLM needs to interact with external APIs, databases, or services. Common use cases include weather queries, search integration, calculator functions, or any scenario where the model should delegate specific operations to deterministic code rather than generating answers directly.

Execution Steps

Step 1: Define Tool Schemas

Create an array of ChatCompletionTool objects describing the available functions. Each tool has a type ("function") and a function descriptor containing the name, description, and parameter schema. The parameter schema follows JSON Schema format, specifying the types, descriptions, and required fields for each argument.

Key considerations:

Tool descriptions should be clear and specific for accurate model selection
Parameter schemas use standard JSON Schema with type, properties, required, and enum fields
Multiple tools can be defined; the model selects which to call based on the query
Tool names should be descriptive and unique within the array

Step 2: Select a Function-Calling Model

Choose a model from the web-llm registry that supports function calling. Models specifically designed for tool use (such as Hermes-2-Pro variants) produce higher quality tool call outputs. Initialize the engine using CreateMLCEngine or a worker-based variant.

Key considerations:

Models like "Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC" are designed for function calling
Not all models produce reliable function calls; use models explicitly trained for this capability
The engine setup is identical to basic chat completion

Step 3: Build the Tool-Calling Request

Construct a ChatCompletionRequest with the tools array and tool_choice parameter. The messages array contains the conversation context. Set tool_choice to "auto" to let the model decide whether to call a tool, or specify a particular function to force a tool call.

Key considerations:

tool_choice: "auto" lets the model decide whether to call tools
tool_choice: "none" prevents tool calling
tool_choice can specify a particular function to force
Streaming is supported: set stream: true to get incremental content with tool calls in the final chunk

Step 4: Execute Inference and Extract Tool Calls

Call engine.chat.completions.create() with the request. The model's response will contain tool_calls in the message (non-streaming) or in the final chunk's delta (streaming). Each tool call includes an ID, the function name, and a JSON string of arguments.

Non-streaming mode:

Tool calls are in reply.choices[0].message.tool_calls
Each tool call has: id, type ("function"), function.name, function.arguments

Streaming mode:

Content may stream as text, with tool calls appearing in the final chunk
The last chunk's delta contains the tool_calls array
Parse function.arguments from the JSON string

Step 5: Execute Functions and Return Results

Parse the tool call arguments, execute the corresponding functions in your application code, and optionally continue the conversation by adding the tool results back to the message history. This enables multi-turn tool-augmented conversations where the model can refine its approach based on function outputs.

Key considerations:

Parse the arguments string with JSON.parse() to get typed parameters
Execute the corresponding function in your application
To continue the conversation, append a "tool" role message with the function result
The model can then synthesize the function output into a natural language response

Execution Diagram

GitHub URL

Workflow Repository