Skip to content

repr: add RowArenaBuf, a growable arena-backed byte buffer#37115

Open
frankmcsherry wants to merge 2 commits into
MaterializeInc:mainfrom
frankmcsherry:rowarena-writer
Open

repr: add RowArenaBuf, a growable arena-backed byte buffer#37115
frankmcsherry wants to merge 2 commits into
MaterializeInc:mainfrom
frankmcsherry:rowarena-writer

Conversation

@frankmcsherry

Copy link
Copy Markdown
Contributor

What

Adds RowArena::writer()RowArenaBuf: a Vec-shaped, writeable byte buffer backed by the arena. You build a value incrementally and commit it:

let mut w = arena.writer();
w.extend_from_slice(b"hello");
w.push(b'!');
let bytes: &[u8] = w.finish(); // valid for the arena's lifetime

It supports push, extend_from_slice, std::io::Write, reads back as &[u8] (via Deref/as_slice), and finish() commits the bytes into the arena and returns a reference valid for the arena's lifetime.

Why

With the region-backed RowArena (#37114), push_bytes copies an already-assembled slice into a region. A producer that assembles bytes piecewise — decoding a compressed row, say — would otherwise build into a scratch buffer and then have push_bytes copy it in. RowArenaBuf lets it write directly into arena storage, removing the scratch and the commit-time copy.

Growth doubles capacity and recycles the previous allocation through a small bounded pool on the arena (MAX_FREE_BUFFERS) rather than freeing it; an abandoned (un-finished) buffer is recycled on drop, and clear feeds the same pool. So a steady-state producer reuses allocations across rows.

Safety

The key invariant: no reference into the buffer escapes before finish, so the buffer is free to relocate (double-and-copy) while being built. Only finish hands out a &'a [u8], and from then on those bytes live in a committed region that is never reallocated while it holds data — the same rule push_bytes relies on in #37114.

Tests

mz_ore::tests (run under miri) cover incremental builds across many growth steps, io::Write, pooled reuse across writers, reference validity after a later writer commits, and empty/abandoned writers.

Stacking

Builds on #37114 (region-backed RowArena). No call sites adopt the writer yet; wiring the per-column decode path onto it is a follow-up.

No behavior change for existing callers; no release note.

@frankmcsherry frankmcsherry requested a review from a team as a code owner June 17, 2026 20:48
@frankmcsherry frankmcsherry force-pushed the rowarena-writer branch 2 times, most recently from f36ce65 to 5ed25c6 Compare June 17, 2026 21:12
frankmcsherry and others added 2 commits June 18, 2026 10:24
Adds `RowArena::writer()` returning a `RowArenaBuf`: a Vec-shaped, writeable
byte buffer that builds a value into a single reusable scratch buffer on the
arena. Write into it (push / extend_from_slice / io::Write / fmt::Write), read it
back as `&[u8]`, and `finish()` / `finish_str()` copies the assembled bytes into
a region (via `push_bytes`), returning a reference valid for the arena's
lifetime.

This lets a producer that assembles bytes piecewise — e.g. decoding a row, or
formatting a cast-to-string result with `write!` — build incrementally without
managing its own scratch allocation; the scratch is retained (cleared, capacity
kept) across writers, so it stops allocating once it reaches its high-water mark.
One writer may be live at a time.

Tests cover incremental builds, io::Write and fmt::Write, scratch reuse across
writers with the earlier result staying valid, finish_str, and empty/abandoned
writers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`RowArena::writer` returns a `RowArenaBuf`, so the type is part of the public
API and must be nameable by callers. Exporting it also fixes the `bin/doc`
(`cargo doc --document-private-items`) failure where `writer`'s public docs
linked to `RowArenaBuf` methods that rustdoc considered private because the type
was unreachable from the crate root.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@DAlperin

Copy link
Copy Markdown
Member

I think this is good but I have two questions:

RowArenaBuf lets it write directly into arena storage, removing the scratch and the commit-time copy.

I think this doesn't actually remove the commit-time copy right? finish calls push_bytes which does region.extend_from_slice which copies in my understanding.

This still solves the alloc churn problem though which is good.

The other thing to think about is that writer() has a reentrancy risk which you could imagine doing a nested decode that tries to get a mutable reference to the same underlying stash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants