Principle:FlowiseAI Flowise Chat Prediction

Attribute	Value
Source Repository	FlowiseAI/Flowise
Domain	Chatflow_Creation
Workflow	Chatflow_Creation
Last Updated	2026-02-12

Overview

Technique for sending user messages to an AI chatflow and receiving generated responses with optional streaming. The prediction system sends a user's message to a configured chatflow for processing. The chatflow executes its node graph (LLM calls, tools, retrievers) and returns a response. Supports both synchronous (full response) and streaming (Server-Sent Events) modes.

Motivation

The prediction system is the execution layer that brings a visual flow graph to life. Without it, a chatflow would be a static diagram. The prediction system:

Enables testing -- Developers can test their chatflows directly in the canvas editor without deploying.
Provides real-time feedback -- Streaming mode delivers partial responses as they are generated, improving perceived responsiveness for long-running LLM calls.
Supports rich responses -- Beyond plain text, responses can include source documents (for RAG), tool usage logs, agent reasoning traces, and file annotations.
Maintains conversation state -- Session IDs enable multi-turn conversations where the chatflow can reference previous messages.

Description

The Chat Prediction principle implements a request-response pattern with optional streaming via Server-Sent Events (SSE). The prediction request includes the user's message, conversation session ID, and optional context.

Request Structure

A prediction request contains:

{
    question: string,           // The user's message text
    chatId: string,             // Optional: conversation session ID for multi-turn
    uploads: Upload[],          // Optional: file uploads (images, documents)
    overrideConfig: object,     // Optional: runtime configuration overrides
    streaming: boolean          // Optional: whether to use streaming mode
}

Response Structure

A prediction response contains:

{
    text: string,                    // The generated response text
    chatMessageId: string,           // Unique ID for this message exchange
    sourceDocuments: Document[],     // Optional: retrieved source documents (RAG)
    usedTools: ToolUsage[],          // Optional: tools invoked during processing
    agentReasoning: Reasoning[],     // Optional: agent's step-by-step reasoning
    fileAnnotations: Annotation[],   // Optional: file references in the response
    chatId: string                   // The conversation session ID
}

Execution Modes

The system supports three endpoint variants:

Internal prediction (non-streaming) -- POST /api/v1/internal-prediction/{id} -- Used within the Flowise UI for testing chatflows. Returns a complete response as a single JSON payload.
Internal prediction (streaming) -- POST /api/v1/internal-prediction/stream/{id} -- Used within the Flowise UI when the chatflow supports streaming. Returns partial responses as SSE events.
Public prediction -- POST /api/v1/prediction/{id} -- The externally accessible endpoint for deployed chatflows. Used by embedded chat widgets and API consumers.

Streaming vs. Non-Streaming Decision

The UI determines which mode to use by querying the getIsChatflowStreaming endpoint before sending the first message. If the chatflow's node graph supports streaming (i.e., the LLM node supports token-by-token output), the streaming endpoint is used. Otherwise, the non-streaming endpoint is used.

Conversation Sessions

Each conversation session is identified by a chatId. The first message in a conversation may omit the chatId, and the server will assign one in the response. Subsequent messages include the chatId to maintain conversational context (e.g., memory, chat history).

Session state is persisted in the browser's localStorage using the pattern {chatflowId}_INTERNAL, enabling conversation continuity across page refreshes.

Theoretical Basis

This principle is grounded in the request-response pattern with optional SSE streaming:

Synchronous request-response -- The standard mode where the client sends a request and waits for a complete response. Simple to implement and reason about, but introduces latency for long-running LLM generations.
Server-Sent Events (SSE) -- The streaming mode uses SSE to deliver partial responses as they become available. This is a unidirectional streaming protocol where the server pushes events to the client over a single HTTP connection. SSE is preferred over WebSockets here because the communication is inherently unidirectional (server to client) and SSE works with standard HTTP infrastructure.
Session management -- The chatId mechanism implements a stateful conversation protocol over a stateless HTTP transport. The server maintains conversation history indexed by chatId, while the client persists the chatId in localStorage for session recovery.
Multi-modal response -- The response structure supports multiple output modalities (text, documents, tool logs, reasoning traces, file annotations), reflecting the composable nature of the chatflow's node graph where different node types contribute different output facets.

Usage

Use this principle when testing or deploying a chatflow that processes natural language inputs and generates AI responses. It applies to:

Testing chatflows in the canvas editor's built-in chat panel.
Deploying chatflows as API endpoints for external consumption.
Implementing streaming UIs that display partial responses in real-time.
Building multi-turn conversational experiences with session persistence.

Related Pages

Implementation:FlowiseAI_Flowise_SendMessageAndGetPrediction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment