Implementation:Langgenius Dify CreateDocument
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| Dify | RAG, Knowledge_Management, Frontend | 2026-02-12 00:00 GMT |
Overview
Description
createDocument and createFirstDocument are frontend service functions that submit documents to the Dify backend for ingestion into a knowledge base. Both functions accept a CreateDocumentReq body that describes the data source, processing rules, chunking mode, embedding model, and retrieval model configuration. The backend then asynchronously executes the full document processing pipeline (parsing, cleaning, splitting, embedding, indexing).
createDocument targets an existing dataset, while createFirstDocument atomically creates a new dataset and its first document through the /datasets/init endpoint.
Usage
- Use
createDocumentwhen adding documents to a dataset that already exists. - Use
createFirstDocumentduring the initial knowledge base setup wizard where dataset creation and first document upload happen in one step. - The returned
batchidentifier can be used withfetchIndexingStatusBatchto monitor the progress of document processing.
Code Reference
Source Location
web/service/datasets.ts, lines 133--139.
Signature
export const createDocument = (
{ datasetId, body }: { datasetId: string, body: CreateDocumentReq }
): Promise<createDocumentResponse> => {
return post<createDocumentResponse>(`/datasets/${datasetId}/documents`, { body })
}
export const createFirstDocument = (
{ body }: { body: CreateDocumentReq }
): Promise<createDocumentResponse> => {
return post<createDocumentResponse>('/datasets/init', { body })
}
Import
import { createDocument, createFirstDocument } from '@/service/datasets'
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
datasetId |
string |
Yes (for createDocument) |
The ID of the target dataset. |
body |
CreateDocumentReq |
Yes | Full document creation specification. |
CreateDocumentReq fields:
| Field | Type | Description |
|---|---|---|
data_source |
DataSource |
Source configuration with type (upload_file, notion_import, website_crawl) and corresponding info_list.
|
doc_form |
ChunkingMode |
Chunking strategy: text_model, qa_model, or hierarchical_model.
|
doc_language |
string |
Language of the document content (e.g., 'English').
|
process_rule |
ProcessRule |
Segmentation and pre-processing rules (separator, max_tokens, chunk_overlap, pre-processing toggles). |
retrieval_model |
RetrievalConfig |
Retrieval configuration (search method, top_k, score threshold, reranking settings). |
embedding_model |
string |
Name of the embedding model to use. |
embedding_model_provider |
string |
Provider of the embedding model. |
indexing_technique |
IndexingType |
Optional. Indexing technique override. |
original_document_id |
string |
Optional. ID of an existing document being re-uploaded. |
Outputs
Returns Promise<createDocumentResponse>:
| Field | Type | Description |
|---|---|---|
dataset |
DataSet ¦ undefined |
The dataset object (present when using createFirstDocument).
|
batch |
string |
Batch identifier for tracking processing progress. |
documents |
InitialDocumentDetail[] |
Array of created document records with their initial indexing status. |
Usage Examples
Uploading a file to an existing dataset
import { createDocument } from '@/service/datasets'
const response = await createDocument({
datasetId: 'ds-abc123',
body: {
data_source: {
type: 'upload_file',
info_list: {
data_source_type: 'upload_file',
file_info_list: { file_ids: ['file-xyz789'] },
},
},
doc_form: 'text_model',
doc_language: 'English',
process_rule: {
mode: 'custom',
rules: {
pre_processing_rules: [{ id: 'remove_extra_spaces', enabled: true }],
segmentation: { separator: '\n\n', max_tokens: 500, chunk_overlap: 50 },
parent_mode: 'full-doc',
subchunk_segmentation: { separator: '\n', max_tokens: 200 },
},
},
retrieval_model: { search_method: 'semantic_search', top_k: 3, score_threshold_enabled: true, score_threshold: 0.5, reranking_enable: false, reranking_model: { reranking_provider_name: '', reranking_model_name: '' } },
embedding_model: 'text-embedding-ada-002',
embedding_model_provider: 'openai',
},
})
console.log(response.batch) // Use to track indexing progress
Creating a dataset with its first document
import { createFirstDocument } from '@/service/datasets'
const response = await createFirstDocument({
body: {
data_source: {
type: 'upload_file',
info_list: {
data_source_type: 'upload_file',
file_info_list: { file_ids: ['file-first001'] },
},
},
doc_form: 'qa_model',
doc_language: 'English',
process_rule: {
mode: 'custom',
rules: {
pre_processing_rules: [],
segmentation: { separator: '\n', max_tokens: 1000 },
parent_mode: 'full-doc',
subchunk_segmentation: { separator: '\n', max_tokens: 300 },
},
},
retrieval_model: { search_method: 'hybrid_search', top_k: 5, score_threshold_enabled: false, score_threshold: 0, reranking_enable: true, reranking_model: { reranking_provider_name: 'cohere', reranking_model_name: 'rerank-english-v2.0' } },
embedding_model: 'text-embedding-3-small',
embedding_model_provider: 'openai',
},
})
const newDatasetId = response.dataset?.id