Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Mlc ai Web llm Tool Call Extraction

From Leeroopedia

Template:Knowledge

Overview

Tool Call Extraction is the process of parsing structured function call information from language model output text into typed tool call objects. After the model generates a response containing tool calls, the raw JSON output must be validated, parsed, and formatted into the standardized tool_calls array used by the OpenAI-compatible API.

Description

When a function-calling-capable model generates a response with tools provided, it outputs a JSON array of objects, each containing a name (the function to call) and arguments (the parameters to pass). The tool call extraction process transforms this raw text into typed ChatCompletionMessageToolCall objects (for non-streaming) or ChatCompletionChunk.Choice.Delta.ToolCall objects (for streaming).

The extraction pipeline performs three critical steps:

  1. JSON parsing -- The raw output message string is parsed as JSON. If parsing fails, a ToolCallOutputParseError is thrown with the original message and underlying error.
  2. Type validation -- The parsed result is verified to be an array. If not, a ToolCallOutputInvalidTypeError is thrown.
  3. Field validation and formatting -- Each element in the array is checked for the required name and arguments fields. If either is missing, a ToolCallOutputMissingFieldsError is thrown. The arguments object is re-serialized to a JSON string (matching the OpenAI convention where arguments are a string, not an object).

After extraction, the engine sets finish_reason to "tool_calls" (instead of "stop") to signal that the response contains tool invocations rather than text content.

Usage

Tool call extraction happens automatically inside the engine's chatCompletion method. Application code does not call the extraction function directly. The parsed tool calls are available in the response:

Non-streaming:

const reply = await engine.chat.completions.create(request);
const toolCalls = reply.choices[0].message.tool_calls;
// Array<ChatCompletionMessageToolCall> | undefined

Streaming:

const stream = await engine.chat.completions.create({ ...request, stream: true });
let lastChunk;
for await (const chunk of stream) {
  lastChunk = chunk;
}
const toolCalls = lastChunk.choices[0].delta.tool_calls;
// Array<ChatCompletionChunk.Choice.Delta.ToolCall> | undefined

When extraction occurs:

  • Only when request.tools is defined and not null
  • Only when finish_reason is "stop" (successful completion). If generation terminated due to "length" or "abort", tool calls are not extracted (incomplete output cannot be reliably parsed).

Error handling: Applications should be prepared for extraction errors. If the model produces malformed JSON (unlikely with grammar-constrained output but possible), the errors propagate as specific typed exceptions.

Theoretical Basis

Tool call extraction bridges the gap between the model's text generation capability and the application's need for structured data. The process relies on several key design decisions:

  1. JSON as interchange format -- Function calls use JSON because it is well-defined, widely supported, and can be grammar-constrained during generation. The model's output is constrained by the officialHermes2FunctionCallSchemaArray JSON schema.
  2. String-typed arguments -- Following the OpenAI convention, arguments is stored as a JSON string rather than a parsed object. This preserves the exact serialization from the model and defers deserialization to the application, which knows the expected parameter types.
  3. ID assignment -- Each tool call receives an id field (the array index as a string for non-streaming, or an index field for streaming). This ID is used to correlate tool results back to specific calls in multi-tool scenarios.
  4. Streaming vs. non-streaming -- The extraction function is overloaded to produce different output types based on the streaming context:
    • Non-streaming: ChatCompletionMessageToolCall with id: string
    • Streaming: ChatCompletionChunk.Choice.Delta.ToolCall with index: number

The extraction runs only when the model stops normally (finish_reason === "stop"). This prevents parsing truncated JSON from length-limited or aborted generations.

I/O Contract

Input:

  • Raw model output string (expected to be a valid JSON array of {"name": string, "arguments": object} objects)
  • Boolean flag indicating streaming or non-streaming mode

Output (non-streaming):

  • Array<ChatCompletionMessageToolCall> where each element has:
    • id: string -- Array index as string (e.g., "0", "1")
    • function.name: string -- The function name
    • function.arguments: string -- JSON-serialized arguments
    • type: "function" -- Literal discriminator

Output (streaming):

  • Array<ChatCompletionChunk.Choice.Delta.ToolCall> where each element has:
    • index: number -- Array index as number
    • function.name: string -- The function name
    • function.arguments: string -- JSON-serialized arguments
    • type: "function" -- Literal discriminator

Error conditions:

Error Type Condition Description
ToolCallOutputParseError JSON.parse fails Output is not valid JSON
ToolCallOutputInvalidTypeError Parsed result is not an array Output is valid JSON but not an array
ToolCallOutputMissingFieldsError Element missing name or arguments Array element lacks required fields

Usage Examples

Accessing tool calls from a non-streaming response:

import * as webllm from "@mlc-ai/web-llm";

const reply = await engine.chat.completions.create({
  stream: false,
  messages: [
    { role: "user", content: "What is the weather in Pittsburgh and Tokyo?" },
  ],
  tools: tools,
  tool_choice: "auto",
});

if (reply.choices[0].finish_reason === "tool_calls") {
  const toolCalls = reply.choices[0].message.tool_calls!;
  for (const call of toolCalls) {
    console.log(`Tool call ID: ${call.id}`);
    console.log(`Function: ${call.function.name}`);
    console.log(`Arguments: ${call.function.arguments}`);
    // Arguments is a JSON string, parse it:
    const args = JSON.parse(call.function.arguments);
    console.log(`Parsed location: ${args.location}`);
  }
}

Accessing tool calls from a streaming response:

const stream = await engine.chat.completions.create({
  stream: true,
  stream_options: { include_usage: true },
  messages: [
    { role: "user", content: "What is the weather in Pittsburgh and Tokyo?" },
  ],
  tools: tools,
  tool_choice: "auto",
});

let lastChunk: webllm.ChatCompletionChunk | undefined;
for await (const chunk of stream) {
  if (!chunk.usage) {
    lastChunk = chunk;
  }
}

// The last non-usage chunk contains tool_calls in its delta
if (lastChunk) {
  const delta = lastChunk.choices[0].delta;
  if (delta.tool_calls) {
    for (const call of delta.tool_calls) {
      console.log(`Index: ${call.index}`);
      console.log(`Function: ${call.function?.name}`);
      console.log(`Arguments: ${call.function?.arguments}`);
    }
  }
}

What the model output looks like before extraction:

[
  {
    "name": "get_current_weather",
    "arguments": {
      "location": "Pittsburgh, PA",
      "unit": "celsius"
    }
  },
  {
    "name": "get_current_weather",
    "arguments": {
      "location": "Tokyo, Japan",
      "unit": "celsius"
    }
  }
]

This gets transformed into ChatCompletionMessageToolCall objects where arguments becomes a JSON string.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment