[common] Fix BinaryRowSerializer reuse buffer never shrinking#8160
Merged
JingsongLi merged 3 commits intoJun 11, 2026
Conversation
05c9f2d to
631e441
Compare
Contributor
Author
|
Updated shrink logic to use a combined fixed-cap and ratio check, consistent with #8159: This replaces the previous fixed-threshold-only approach ( Tests updated to 6 cases covering the ratio-based scenarios. |
JingsongLi
reviewed
Jun 11, 2026
| * Maximum retained reuse buffer size in bytes. Buffers exceeding this cap are eligible for | ||
| * shrinking when the shrink ratio condition is also met. | ||
| */ | ||
| private static final int MAX_RETAINED_REUSE_BUFFER_SIZE = 4 * 1024 * 1024; // 4MB |
badb0b8 to
c55d98b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: #7620
During external merge sort, each merge channel holds a
BinaryRowreuse instance viaBinaryRowSerializer.deserialize(reuse, source). When a large record is deserialized, the backingMemorySegmentgrows to fit it but is never shrunk for subsequent small records. Withmax-num-file-handles(default 128) channels each retaining a 100MB+ buffer, memory usage explodes into OOM.Changes
BinaryRowSerializer: add shrink logic with combined cap and ratio hysteresis indeserialize(BinaryRow reuse, DataInputView source)— reallocate only when both conditions are met:The ratio-based approach avoids thrashing for sustained medium-to-large records while still reclaiming memory after a spike. Consistent with the approach used in #8159.
Tests
BinaryRowSerializerShrinkTest— 6 test cases covering:API and Format
N/A
Documentation
N/A