Implementation:Datajuicer Data juicer DJ MCP Granular Ops
| Knowledge Sources | |
|---|---|
| Domains | Tooling |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for exposing individual Data-Juicer operators as MCP tools provided by Data-Juicer.
Description
DJ_MCP_Granular_Ops dynamically creates an MCP (Model Context Protocol) server that exposes each individual Data-Juicer operator as a separate callable tool, enabling fine-grained access to operators from AI assistants. It uses OPSearcher to discover all available operators, then for each operator dynamically generates a function with the correct signature (including dataset_path, export_path, and the operator's own parameters) via create_operator_function. These functions are registered as tools on a FastMCP server. Special handling converts jsonargparse types to pydantic-compatible annotations via process_parameter.
Usage
Use when you want AI agents to invoke individual Data-Juicer operators directly as MCP tools rather than requiring full recipe configuration, enabling granular operator-level access.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File:
data_juicer/tools/DJ_mcp_granular_ops.py
Signature
def process_parameter(name: str, param: inspect.Parameter) -> inspect.Parameter:
def create_operator_function(op, mcp):
def create_mcp_server(port: str = "8000"):
Import
from data_juicer.tools.DJ_mcp_granular_ops import create_mcp_server
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| port | str | No | Port number for the MCP server. Default: "8000" |
| DJ_OPS_LIST_PATH | env var | No | Path to a file listing operator names to expose (one per line). If unset, all operators are exposed |
Outputs
| Name | Type | Description |
|---|---|---|
| mcp | FastMCP | A configured FastMCP server instance with all operators registered as tools |
Usage Examples
# Start the granular-ops MCP server programmatically
from data_juicer.tools.DJ_mcp_granular_ops import create_mcp_server
mcp = create_mcp_server(port="8000")
mcp.run(transport="sse")
# Or from command line:
# python -m data_juicer.tools.DJ_mcp_granular_ops --port 8000