[fix](be) Fix QuantileState protobuf serialization in write/read_column_to_pb#64152
[fix](be) Fix QuantileState protobuf serialization in write/read_column_to_pb#64152heguanhui wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
c3f87fe to
9f6743d
Compare
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 29375 ms |
TPC-DS: Total hot run time: 169788 ms |
There was a problem hiding this comment.
The fix is correct: write/read_column_to_pb now serialize/deserialize the logical value via get_element()/serialize()/deserialize() + insert_value(), which is consistent with the existing write_column_to_arrow, write_column_to_orc, to_string paths in this file, and with the bitmap/hll SerDes. The added set_id(PGenericType::QUANTILE_STATE) and the switch to assert_cast are also correct. The same-class bug does not exist in the bitmap/hll SerDes — they were already correct — so QuantileState was the only outlier.
Four points to address before merge:
deserialize return value is ignored. QuantileState::deserialize returns bool. Since this PR fixes a data-corruption path, malformed input should surface as an error rather than be silently accepted:
QuantileState value;
if (!value.deserialize(Slice(arg.bytes_value(i)))) {
return Status::InternalError("Failed to deserialize QuantileState from pb");
}
col.insert_value(std::move(value));
The test's equality check is partial. Comparing only get_value_by_percentile(0.5/1.0) samples a few quantile points and can miss corruption that preserves those points. A round-trip test can assert exact equality by serializing both elements and comparing the byte strings (optionally also get_serialized_size()). Keep the percentile checks as a supplement if desired.
This is a wire-format change. The protobuf payload for QuantileState changes from fixed-size raw memory to variable-length serialized bytes. In a rolling upgrade, old and new BEs are incompatible on the RPC UDF / RPC aggregate / fold-constant paths. Real-world risk is low because the old format was already broken (it transmitted pointer values, not data) and never produced correct cross-node results. This PR currently has no backport labels — if it is backported, please call out the format change so the merge can ensure affected nodes are upgraded together.
|
/review |
There was a problem hiding this comment.
Requesting changes for one correctness issue in the new QuantileState protobuf reader.
Critical checkpoint conclusions:
- Goal: the PR aims to fix QuantileState protobuf serialization. The writer now uses the logical QuantileState serialization and sets the type id, but the reader still does not fail when that logical deserialization fails.
- Scope: the change is small and focused on the QuantileState serde plus its unit test.
- Concurrency/lifecycle/config: no new concurrency, special lifecycle, static initialization, or configuration behavior is introduced.
- Compatibility/protocol: this path crosses external RPC UDF/RPC aggregate and BE-to-FE constant-folding protobuf boundaries. The payload format now matches existing internal QuantileState binary serialization, but stale or malformed protobuf bytes must be rejected instead of converted to empty states.
- Parallel paths: storage/block QuantileState serialization already uses QuantileState::serialize; this PR brings protobuf toward that path. Other QuantileState input paths such as from_olap_string/base64 check deserialize failures.
- Tests: the expanded unit test covers normal EMPTY/SINGLE/EXPLICIT/TDIGEST round-trips and type id assignment. It does not cover malformed/stale protobuf payloads, which is the gap behind the inline finding.
- Observability/error handling: QuantileState::deserialize logs a warning on invalid slices, but read_column_from_pb currently returns OK and inserts the default EMPTY value, so callers cannot detect the failed decode.
- Transactions/persistence/data writes: not applicable to this PR.
- Performance: no blocking performance issue found in the modified protobuf loop.
User focus: no additional user-provided review focus was present.
I ran git diff --check for the PR diff; no whitespace errors were reported. I did not run the BE unit test suite in this review pass.
| for (int i = 0; i < arg.bytes_value_size(); ++i) { | ||
| column.insert_data(arg.bytes_value(i).c_str(), arg.bytes_value(i).size()); | ||
| QuantileState value; | ||
| value.deserialize(Slice(arg.bytes_value(i))); |
There was a problem hiding this comment.
PValues is not only read back from this same writer: RPCFnImpl::_convert_to_block() reads PFunctionCallResponse.result(0) from the RPC function server, and constant folding also moves PValues across process boundaries. If that payload is malformed, truncated, or produced by an implementation that still sends the old raw QuantileState object bytes, QuantileState::deserialize() returns false and leaves value as the default EMPTY state, but this code still inserts it and returns Status::OK(). That makes the query silently lose the quantile value instead of failing. Please propagate a non-OK Status here, and add a negative test with an invalid bytes_value.
| value.deserialize(Slice(arg.bytes_value(i))); | |
| if (!value.deserialize(Slice(arg.bytes_value(i)))) { | |
| return Status::InternalError( | |
| "deserialize QuantileState from protobuf fail at index {}", i); | |
| } |
|
PR approved by at least one committer and no changes requested. |
9f6743d to
47e0034
Compare
|
/review |
|
run buildall |
TPC-H: Total hot run time: 28983 ms |
TPC-DS: Total hot run time: 168300 ms |
|
run buildall |
1 similar comment
|
run buildall |
TPC-H: Total hot run time: 28608 ms |
TPC-DS: Total hot run time: 168506 ms |
|
PR approved by at least one committer and no changes requested. |
What problem does this PR solve?
Issue Number: close #64149
Problem Summary:
DataTypeQuantileStateSerDe::write_column_to_pbusedget_data_at()to copysizeof(QuantileState)bytes of raw memory, butQuantileStatecontains heap-allocated members (shared_ptr<TDigest>,vector<double>) whose data is not stored inline.read_column_from_pbthen usedinsert_data()to reinterpret this incomplete memory as aQuantileStateobject, resulting in dangling pointers and data corruption.Additionally,
write_column_to_pbwas missing thePGenericType::QUANTILE_STATEtype id assignment.The test also had a bug: it compared
QuantileStateobjects viaget_data_at()(raw memory comparison), which compares pointer values rather than logical equality.Release note
Fixed QuantileState type corruption in RPC UDF / RPC aggregate function / fold constant paths. Previously, QuantileState values could be silently corrupted or lost when transmitted via protobuf.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)