Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Risingwavelabs Risingwave Stream Chunk Sizing

From Leeroopedia



Knowledge Sources
Domains Streaming, Optimization, Memory_Management
Last Updated 2026-02-09 08:00 GMT

Overview

Stream chunk initial capacity defaults to 64 rows with a hard cap of 4096 rows to balance memory allocation efficiency against batching throughput in the streaming engine.

Description

RisingWave's streaming engine processes data in chunks (batches of rows). The StreamChunkBuilder manages allocation of these chunks with a default initial capacity of 64 rows and a maximum initial capacity of 4096 rows. The actual capacity is computed as the minimum of the configured chunk size and these caps. This prevents over-allocation when large chunk sizes are configured but few rows arrive, while still allowing efficient batching.

Usage

Apply this heuristic when tuning streaming throughput vs memory trade-offs, debugging memory usage in stream executors, or configuring chunk_size for streaming queries. Understanding these bounds helps explain why increasing the configured chunk_size beyond 4096 does not increase initial allocation.

The Insight (Rule of Thumb)

  • Action: The stream chunk builder uses min(configured_chunk_size, 4096) as initial capacity, with 64 as the default when no size is specified.
  • Value: Default: 64 rows; Maximum initial: 4096 rows.
  • Trade-off: Smaller initial capacity (64) reduces memory waste when processing sparse streams. Larger capacity (up to 4096) improves throughput for high-volume streams by reducing reallocation.
  • DML Atomicity: For DML operations, a separate constant MAX_CHUNK_FOR_ATOMICITY = 32 limits how many chunks can be combined into a single atomic transaction.

Reasoning

Streaming systems must balance two competing concerns: (1) minimizing memory allocation by not pre-allocating large buffers for streams that may produce few rows, and (2) maximizing throughput by reducing the overhead of frequent small allocations and message passing. The 64-row default was chosen empirically as a reasonable balance. The 4096-row cap prevents pathological allocation when chunk_size is set very high (e.g., 65536) but the stream chunk builder should not pre-allocate that much memory upfront since most chunks will be flushed before reaching maximum capacity.

The DML atomicity limit of 32 chunks ensures that transactional DML operations do not produce unbounded transaction sizes, which would increase commit latency and recovery time.

Code evidence from src/common/src/array/stream_chunk_builder.rs lines 60-61, 70:

const MAX_INITIAL_CAPACITY: usize = 4096;
const DEFAULT_INITIAL_CAPACITY: usize = 64;

Code evidence from src/stream/src/executor/dml.rs line 61:

const MAX_CHUNK_FOR_ATOMICITY: usize = 32;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment