Implementation:Langgenius Dify CreateEmptyDataset

Knowledge Sources	Domains	Last Updated
Dify	RAG, Knowledge_Management, Frontend	2026-02-12 00:00 GMT

Overview

Description

createEmptyDataset is a frontend service function that creates a new, empty knowledge base (dataset) in Dify. It issues a POST request to the /datasets endpoint with only a name, and the backend provisions the full dataset resource with default configuration for indexing, permissions, embedding model, and retrieval model.

This function represents the simplest entry point into the Knowledge Base Management workflow. Once the dataset is created, documents can be uploaded, chunked, indexed, and queried within it.

Usage

Call createEmptyDataset when the user initiates the "Create Knowledge Base" action from the UI.
The returned DataSet object provides the id needed to add documents, configure settings, or attach the dataset to applications.
Typically followed by a call to createFirstDocument or createDocument to populate the newly created dataset.

Code Reference

Source Location

web/service/datasets.ts, lines 84--86.

Signature

export const createEmptyDataset = ({ name }: { name: string }): Promise<DataSet> => {
  return post<DataSet>('/datasets', { body: { name } })
}

Import

import { createEmptyDataset } from '@/service/datasets'

I/O Contract

Inputs

Parameter	Type	Required	Description
`name`	`string`	Yes	Human-readable name for the new dataset.

Outputs

Returns Promise<DataSet>. Key fields of the DataSet type:

Field	Type	Description
`id`	`string`	Unique identifier for the created dataset.
`name`	`string`	Name of the dataset as provided in the request.
`indexing_status`	`DocumentIndexingStatus`	Current indexing status of the dataset.
`permission`	`DatasetPermission`	Access control level: `only_me`, `all_team_members`, or `partial_members`.
`doc_form`	`ChunkingMode`	Default chunking mode: `text_model`, `qa_model`, or `hierarchical_model`.
`runtime_mode`	`'rag_pipeline' ¦ 'general'`	Whether the dataset operates as a standard knowledge base or a RAG pipeline.
`embedding_model`	`string`	Name of the embedding model assigned to the dataset.
`embedding_model_provider`	`string`	Provider of the embedding model.
`retrieval_model`	`RetrievalConfig`	Default retrieval configuration including search method, top_k, and score threshold.

Usage Examples

Creating a new knowledge base

import { createEmptyDataset } from '@/service/datasets'

const dataset = await createEmptyDataset({ name: 'Product Documentation' })
console.log(dataset.id) // Use this ID for subsequent document uploads

Creating a dataset and immediately adding a document

import { createEmptyDataset, createDocument } from '@/service/datasets'

const dataset = await createEmptyDataset({ name: 'FAQ Knowledge Base' })
const docResponse = await createDocument({
  datasetId: dataset.id,
  body: {
    data_source: { type: 'upload_file', info_list: { data_source_type: 'upload_file', file_info_list: { file_ids: [fileId] } } },
    doc_form: 'text_model',
    doc_language: 'English',
    process_rule: processRule,
    retrieval_model: dataset.retrieval_model,
    embedding_model: dataset.embedding_model,
    embedding_model_provider: dataset.embedding_model_provider,
  },
})

Related Pages

Principle:Langgenius_Dify_Dataset_Creation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment