repr: add RowArenaBuf, a growable arena-backed byte buffer#37115
Open
frankmcsherry wants to merge 2 commits into
Open
repr: add RowArenaBuf, a growable arena-backed byte buffer#37115frankmcsherry wants to merge 2 commits into
frankmcsherry wants to merge 2 commits into
Conversation
f36ce65 to
5ed25c6
Compare
Adds `RowArena::writer()` returning a `RowArenaBuf`: a Vec-shaped, writeable byte buffer that builds a value into a single reusable scratch buffer on the arena. Write into it (push / extend_from_slice / io::Write / fmt::Write), read it back as `&[u8]`, and `finish()` / `finish_str()` copies the assembled bytes into a region (via `push_bytes`), returning a reference valid for the arena's lifetime. This lets a producer that assembles bytes piecewise — e.g. decoding a row, or formatting a cast-to-string result with `write!` — build incrementally without managing its own scratch allocation; the scratch is retained (cleared, capacity kept) across writers, so it stops allocating once it reaches its high-water mark. One writer may be live at a time. Tests cover incremental builds, io::Write and fmt::Write, scratch reuse across writers with the earlier result staying valid, finish_str, and empty/abandoned writers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`RowArena::writer` returns a `RowArenaBuf`, so the type is part of the public API and must be nameable by callers. Exporting it also fixes the `bin/doc` (`cargo doc --document-private-items`) failure where `writer`'s public docs linked to `RowArenaBuf` methods that rustdoc considered private because the type was unreachable from the crate root. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
e1a45ae to
d726eaa
Compare
Member
|
I think this is good but I have two questions:
I think this doesn't actually remove the commit-time copy right? This still solves the alloc churn problem though which is good. The other thing to think about is that writer() has a reentrancy risk which you could imagine doing a nested decode that tries to get a mutable reference to the same underlying stash |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
RowArena::writer()→RowArenaBuf: aVec-shaped, writeable byte buffer backed by the arena. You build a value incrementally and commit it:It supports
push,extend_from_slice,std::io::Write, reads back as&[u8](viaDeref/as_slice), andfinish()commits the bytes into the arena and returns a reference valid for the arena's lifetime.Why
With the region-backed
RowArena(#37114),push_bytescopies an already-assembled slice into a region. A producer that assembles bytes piecewise — decoding a compressed row, say — would otherwise build into a scratch buffer and then havepush_bytescopy it in.RowArenaBuflets it write directly into arena storage, removing the scratch and the commit-time copy.Growth doubles capacity and recycles the previous allocation through a small bounded pool on the arena (
MAX_FREE_BUFFERS) rather than freeing it; an abandoned (un-finished) buffer is recycled on drop, andclearfeeds the same pool. So a steady-state producer reuses allocations across rows.Safety
The key invariant: no reference into the buffer escapes before
finish, so the buffer is free to relocate (double-and-copy) while being built. Onlyfinishhands out a&'a [u8], and from then on those bytes live in a committed region that is never reallocated while it holds data — the same rulepush_bytesrelies on in #37114.Tests
mz_ore::tests (run under miri) cover incremental builds across many growth steps,io::Write, pooled reuse across writers, reference validity after a later writer commits, and empty/abandoned writers.Stacking
Builds on #37114 (region-backed
RowArena). No call sites adopt the writer yet; wiring the per-column decode path onto it is a follow-up.No behavior change for existing callers; no release note.