Workflow:Mlc ai Web llm Function Calling
| Knowledge Sources | |
|---|---|
| Domains | LLMs, WebGPU, Function_Calling, Tool_Use |
| Last Updated | 2026-02-14 22:00 GMT |
Overview
End-to-end process for enabling LLM function calling (tool use) in the browser, where the model selects and parameterizes external functions to invoke based on user queries.
Description
This workflow implements the OpenAI-compatible function calling (tool use) pattern in web-llm. The user defines a set of available tools with their function signatures, and the model decides when and how to invoke them based on the conversation context. The model generates structured tool call objects containing the function name and arguments, which the application can then execute. Both streaming and non-streaming modes are supported, with tool calls returned in the final response or final streaming chunk.
Usage
Execute this workflow when building an agentic application where the LLM needs to interact with external APIs, databases, or services. Common use cases include weather queries, search integration, calculator functions, or any scenario where the model should delegate specific operations to deterministic code rather than generating answers directly.
Execution Steps
Step 1: Define Tool Schemas
Create an array of ChatCompletionTool objects describing the available functions. Each tool has a type ("function") and a function descriptor containing the name, description, and parameter schema. The parameter schema follows JSON Schema format, specifying the types, descriptions, and required fields for each argument.
Key considerations:
- Tool descriptions should be clear and specific for accurate model selection
- Parameter schemas use standard JSON Schema with type, properties, required, and enum fields
- Multiple tools can be defined; the model selects which to call based on the query
- Tool names should be descriptive and unique within the array
Step 2: Select a Function-Calling Model
Choose a model from the web-llm registry that supports function calling. Models specifically designed for tool use (such as Hermes-2-Pro variants) produce higher quality tool call outputs. Initialize the engine using CreateMLCEngine or a worker-based variant.
Key considerations:
- Models like "Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC" are designed for function calling
- Not all models produce reliable function calls; use models explicitly trained for this capability
- The engine setup is identical to basic chat completion
Step 3: Build the Tool-Calling Request
Construct a ChatCompletionRequest with the tools array and tool_choice parameter. The messages array contains the conversation context. Set tool_choice to "auto" to let the model decide whether to call a tool, or specify a particular function to force a tool call.
Key considerations:
- tool_choice: "auto" lets the model decide whether to call tools
- tool_choice: "none" prevents tool calling
- tool_choice can specify a particular function to force
- Streaming is supported: set stream: true to get incremental content with tool calls in the final chunk
Step 4: Execute Inference and Extract Tool Calls
Call engine.chat.completions.create() with the request. The model's response will contain tool_calls in the message (non-streaming) or in the final chunk's delta (streaming). Each tool call includes an ID, the function name, and a JSON string of arguments.
Non-streaming mode:
- Tool calls are in reply.choices[0].message.tool_calls
- Each tool call has: id, type ("function"), function.name, function.arguments
Streaming mode:
- Content may stream as text, with tool calls appearing in the final chunk
- The last chunk's delta contains the tool_calls array
- Parse function.arguments from the JSON string
Step 5: Execute Functions and Return Results
Parse the tool call arguments, execute the corresponding functions in your application code, and optionally continue the conversation by adding the tool results back to the message history. This enables multi-turn tool-augmented conversations where the model can refine its approach based on function outputs.
Key considerations:
- Parse the arguments string with JSON.parse() to get typed parameters
- Execute the corresponding function in your application
- To continue the conversation, append a "tool" role message with the function result
- The model can then synthesize the function output into a natural language response