Implementation:ClickHouse ClickHouse Poco MultipartReader
base/poco/Net/src/MultipartReader.cpp:1-305
ClickHouse_ClickHouse
ClickHouse_ClickHouse_MIME_Multipart_Processing
Purpose
Implements the `Poco::Net::MultipartReader` and its supporting classes (`MultipartStreamBuf`, `MultipartIOS`, `MultipartInputStream`) for reading MIME multipart messages as defined by RFC 2046. The reader parses boundary-delimited message parts from an input stream, providing access to each part's headers and body as a sub-stream.
Code Reference
MultipartStreamBuf -- Boundary-Aware Reading
The core parsing logic resides in `readFromDevice`, which reads data from the underlying stream and detects boundary lines to delimit parts:
int MultipartStreamBuf::readFromDevice(char* buffer, std::streamsize length)
{
poco_assert(!_boundary.empty() && _boundary.length() < length - 6);
static const int eof = std::char_traits<char>::eof();
std::streambuf& buf = *_istr.rdbuf();
int n = 0;
int ch = buf.sbumpc();
if (ch == eof) return -1;
*buffer++ = (char) ch; ++n;
if (ch == '\n' || (ch == '\r' && buf.sgetc() == '\n'))
{
// After newline, check for "--" + boundary
// If boundary matches and followed by CRLF: return 0 (next part)
// If boundary matches and followed by "--": set _lastPart, return 0
}
// Otherwise read until next newline
return n;
}
MultipartReader -- Part Iteration
void MultipartReader::nextPart(MessageHeader& messageHeader)
{
if (!_pMPI)
{
if (_boundary.empty())
guessBoundary();
else
findFirstBoundary();
}
else if (_pMPI->lastPart())
{
throw MultipartException("No more parts available");
}
parseHeader(messageHeader);
_pMPI = std::make_unique<MultipartInputStream>(_istr, _boundary);
}
bool MultipartReader::hasNextPart()
{
return (!_pMPI || !_pMPI->lastPart()) && _istr.good();
}
Boundary Discovery
If no boundary is provided, `guessBoundary` reads it from the first line of the stream:
void MultipartReader::guessBoundary()
{
static const int eof = std::char_traits<char>::eof();
int ch = _istr.get();
while (Poco::Ascii::isSpace(ch))
ch = _istr.get();
if (ch == '-' && _istr.peek() == '-')
{
_istr.get();
ch = _istr.peek();
while (ch != eof && ch != '\r' && ch != '\n' && _boundary.size() < 128)
{
_boundary += (char) _istr.get();
ch = _istr.peek();
}
// validate and consume line ending
}
else throw MultipartException("No boundary line found");
}
First Boundary Search
void MultipartReader::findFirstBoundary()
{
std::string expect("--");
expect.append(_boundary);
std::string line;
bool ok = true;
do
{
ok = readLine(line, expect.length());
}
while (ok && line != expect);
if (!ok) throw MultipartException("No boundary line found");
}
I/O Contract
| Input | Output | Side Effects |
|---|---|---|
| `std::istream&` + optional `boundary` string | `MultipartReader` object | None |
| `MessageHeader&` via `nextPart` | Populated header for the next part | Advances stream past boundary and part headers; creates new `MultipartInputStream`; throws `MultipartException` if no more parts |
| `hasNextPart` | `bool` | None (read-only check) |
| `stream` | `std::istream&` reference to current part body | Throws if `nextPart` has not been called |
| `boundary` | `const std::string&` | None (accessor) |
Usage Examples
// Reading a multipart message with known boundary
std::istringstream body(multipartData);
Poco::Net::MultipartReader reader(body, "----boundary123");
while (reader.hasNextPart())
{
Poco::Net::MessageHeader partHeader;
reader.nextPart(partHeader);
std::string contentType = partHeader.get("Content-Type", "");
std::istream& partStream = reader.stream();
// Read part body
std::string partBody;
Poco::StreamCopier::copyToString(partStream, partBody);
}
// Auto-detecting boundary from the stream
std::istringstream body2(multipartData);
Poco::Net::MultipartReader reader2(body2);
// boundary is guessed from the first line
Internal Details
- `MultipartStreamBuf` extends `Poco::BufferedStreamBuf` with a fixed buffer size of `STREAM_BUFFER_SIZE`. It reads one character at a time from the underlying `std::streambuf` to detect boundary lines.
- The boundary detection algorithm checks for `\r\n--boundary` (or `\n--boundary`) at line beginnings. If the boundary is followed by `\r\n`, a new part begins. If followed by `--`, it is the closing boundary.
- The `readLine` helper limits line length to 1024 characters to prevent excessive memory usage from malformed input.
- `guessBoundary` accepts boundaries up to 128 characters, exceeding the RFC 2046 recommendation of 70 characters for compatibility.
- A `MultipartInputStream` is created for each part, wrapping the underlying stream with the boundary-aware `MultipartStreamBuf`. When the stream buffer detects a boundary, it returns 0 bytes, causing the stream to reach EOF for that part.
- The `_lastPart` flag is set when the closing boundary (`--boundary--`) is detected, preventing further iteration.
- The `parseHeader` method delegates to `MessageHeader::read` and consumes the blank line separator between headers and body.