Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Langgenius Dify CreateEmptyDataset

From Leeroopedia
Knowledge Sources Domains Last Updated
Dify RAG, Knowledge_Management, Frontend 2026-02-12 00:00 GMT

Overview

Description

createEmptyDataset is a frontend service function that creates a new, empty knowledge base (dataset) in Dify. It issues a POST request to the /datasets endpoint with only a name, and the backend provisions the full dataset resource with default configuration for indexing, permissions, embedding model, and retrieval model.

This function represents the simplest entry point into the Knowledge Base Management workflow. Once the dataset is created, documents can be uploaded, chunked, indexed, and queried within it.

Usage

  • Call createEmptyDataset when the user initiates the "Create Knowledge Base" action from the UI.
  • The returned DataSet object provides the id needed to add documents, configure settings, or attach the dataset to applications.
  • Typically followed by a call to createFirstDocument or createDocument to populate the newly created dataset.

Code Reference

Source Location

web/service/datasets.ts, lines 84--86.

Signature

export const createEmptyDataset = ({ name }: { name: string }): Promise<DataSet> => {
  return post<DataSet>('/datasets', { body: { name } })
}

Import

import { createEmptyDataset } from '@/service/datasets'

I/O Contract

Inputs

Parameter Type Required Description
name string Yes Human-readable name for the new dataset.

Outputs

Returns Promise<DataSet>. Key fields of the DataSet type:

Field Type Description
id string Unique identifier for the created dataset.
name string Name of the dataset as provided in the request.
indexing_status DocumentIndexingStatus Current indexing status of the dataset.
permission DatasetPermission Access control level: only_me, all_team_members, or partial_members.
doc_form ChunkingMode Default chunking mode: text_model, qa_model, or hierarchical_model.
runtime_mode 'rag_pipeline' ¦ 'general' Whether the dataset operates as a standard knowledge base or a RAG pipeline.
embedding_model string Name of the embedding model assigned to the dataset.
embedding_model_provider string Provider of the embedding model.
retrieval_model RetrievalConfig Default retrieval configuration including search method, top_k, and score threshold.

Usage Examples

Creating a new knowledge base

import { createEmptyDataset } from '@/service/datasets'

const dataset = await createEmptyDataset({ name: 'Product Documentation' })
console.log(dataset.id) // Use this ID for subsequent document uploads

Creating a dataset and immediately adding a document

import { createEmptyDataset, createDocument } from '@/service/datasets'

const dataset = await createEmptyDataset({ name: 'FAQ Knowledge Base' })
const docResponse = await createDocument({
  datasetId: dataset.id,
  body: {
    data_source: { type: 'upload_file', info_list: { data_source_type: 'upload_file', file_info_list: { file_ids: [fileId] } } },
    doc_form: 'text_model',
    doc_language: 'English',
    process_rule: processRule,
    retrieval_model: dataset.retrieval_model,
    embedding_model: dataset.embedding_model,
    embedding_model_provider: dataset.embedding_model_provider,
  },
})

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment