Heuristic:Treeverse LakeFS S3 Multipart Size Constraint
| Knowledge Sources | |
|---|---|
| Domains | Optimization, S3_Gateway |
| Last Updated | 2026-02-08 10:00 GMT |
Overview
S3 multipart uploads require a minimum part size of 5 MiB for all parts except the last, as defined by the S3 protocol and enforced by the lakeFS S3 gateway.
Description
The S3 protocol specifies that each part of a multipart upload (except the last part) must be at least 5 MiB (5,242,880 bytes). This is a hard constraint that cannot be changed by lakeFS configuration. The lakeFS S3 gateway enforces this constraint identically to AWS S3. The integration test suite uses a 6 MiB constant for "large" objects to ensure multipart operations are exercised, and intentionally uses non-round numbers (16 bytes) for small objects to avoid coincidental alignment.
Usage
Use this heuristic when implementing multipart upload workflows through the lakeFS S3 gateway, debugging multipart upload failures, or sizing data for upload operations. Objects smaller than 5 MiB should use simple PUT operations, not multipart uploads.
The Insight (Rule of Thumb)
- Action: Set multipart upload part sizes to at least 5 MiB (5,242,880 bytes). Only the final part can be smaller.
- Value: `minDataContentLengthForMultipart = 5 << 20` (5 MiB). Use `6 << 20` (6 MiB) as a safe test value.
- Trade-off: Larger part sizes reduce the number of parts (fewer API calls) but increase memory usage per part. The S3 protocol also limits total parts to 10,000.
Reasoning
The 5 MiB minimum is an AWS S3 protocol requirement inherited by all S3-compatible systems including the lakeFS gateway. Violating this constraint causes `EntityTooSmall` errors from the server. The lakeFS test suite documents this constraint explicitly and uses it to verify correct gateway behavior. The intentional use of non-round numbers (16 bytes) for small test objects prevents false positives that could occur from coincidental boundary alignment.
Code Evidence
Multipart size constants from `esti/esti_utils.go:370-382`:
const (
// randomDataContentLength is the content length used for small
// objects. It is intentionally not a round number.
randomDataContentLength = 16
// minDataContentLengthForMultipart is the content length for all
// parts of a multipart upload except the last. Its value -- 5MiB
// -- is defined in the S3 protocol, and cannot be changed.
minDataContentLengthForMultipart = 5 << 20
// largeDataContentLength is >minDataContentLengthForMultipart,
// which is large enough to require multipart operations.
largeDataContentLength = 6 << 20
)