feat(storage): implement GCS overrun logging and resume prevention#16150
feat(storage): implement GCS overrun logging and resume prevention#16150v-pratap wants to merge 8 commits into
Conversation
This commit implements overrun logging, satisfied-resume prevention, and server-side transcoding bypass for both the synchronous and asynchronous GCS download paths. Synchronous Path: - Introduced a private 'OverrunLoggingObjectReadSource' decorator inside 'object_read_streambuf.cc' that wraps the transport source to track received bytes lock-free. - Logs a warning with the exact overshoot byte count, bucket, and object name if GCS sends more bytes than requested. - Leaves 'object_read_streambuf.h' and 'object_read_stream.h' completely clean, untouched, and thread-safe. Asynchronous Path (gRPC BiDi Stream & Streaming Reader): - Integrated chunk-level byte tracking inside 'ReadRange' and 'AsyncReaderConnectionResume'. - Under server-side transcoding (gzip), suppresses overrun warnings and bypasses checksum validation failures (since decompressed bytes naturally exceed the compressed request and mismatch the original checksums). - Implemented satisfied-resume prevention: if a connection fails but the requested number of bytes has already been received, we refuse to reconnect to prevent GCS from sending unwanted extra bytes, and return Status::OK directly, masking any subsequent stream errors. - Defined overloaded constructors to maintain 100% backward compatibility for all existing tests and client code. Testing: - Added 15 new, comprehensive unit tests covering all overrun, tail-read, transcoding, and resume scenarios.
There was a problem hiding this comment.
Code Review
This pull request introduces overrun logging and transcoding detection when reading objects from GCS, updating classes like ReadRange, AsyncReaderConnectionResume, and ObjectReadStreambuf to track received bytes and suppress warnings or checksum validation for transcoded data. The code review identified several critical issues: data races in ObjectDescriptorImpl::OnRead due to accessing metadata_ after unlocking and in AsyncReaderConnectionResume::Reconnect due to accessing impl_ without a lock; potential undefined behavior from signed integer overflow when negating p.start at its minimum value; duplicated overrun check logic in OverrunLoggingObjectReadSource; and a bug where casting a negative requested_length_ to std::size_t causes an unsigned wrap-around, bypassing the overrun check.
- Fix data race on metadata_ in ObjectDescriptorImpl::OnRead by extracting is_transcoded status before unlocking the mutex. - Fix data race on impl_ in AsyncReaderConnectionResume::Reconnect by calling CurrentImpl() to safely acquire the lock. - Fix potential signed integer negation overflow undefined behavior in ObjectDescriptorImpl::Read by explicitly checking for numeric_limits min() and capping the limit. - De-duplicate overrun check logic in OverrunLoggingObjectReadSource by calling CheckOverrun() directly in Read(). - Fix potential signed-to-unsigned cast wrap-around bug in CheckOverrun() by checking that requested_length_ is non-negative before casting.
- Dynamically verify if server-side decompressive transcoding actually occurred by comparing total received bytes to the object's original stored size (when available). - Prevents silently ignoring real data corruption when a user explicitly requests compressed gzip-encoded bytes and receives a corrupted file. - Falls back to the standard is_transcoded flag if the object size is unknown (maintaining 100% backward compatibility and preventing test breakages). - Applied this robust check to both ReadRange (async BiDi range reads) and AsyncReaderConnectionResume (async streaming reader). - Added a new unit test 'TranscodingFailsOnRealChecksumMismatch' to verify that a checksum mismatch is correctly reported as an error if the received size matches the compressed object size. - Resolved a compiler signedness warning in read_range.cc by casting object_size to std::size_t. - Made the 'logged_warning_' flag in ObjectReadStreambuf a std::atomic<bool> to ensure absolute thread-safety.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #16150 +/- ##
==========================================
+ Coverage 92.20% 92.21% +0.01%
==========================================
Files 2264 2264
Lines 208864 209497 +633
==========================================
+ Hits 192579 193191 +612
- Misses 16285 16306 +21 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
- Refactored overrun warning logs to print once at the end of the download (stream Close or completion/failure) rather than immediately on the first overrunning chunk. - Aligns the C++ GCS client's logging behavior with the official Google Cloud Go client (cloud.google.com/go/storage). - Ensures that the warning log reports the true, final total overrun byte count instead of a partial overrun count from the first chunk. - Sync Path: Removed CheckOverrun from the hot Read() path in ObjectReadStreambuf, moving it to Close() and the decorator destructor (~OverrunLoggingObjectReadSource). - Async Range Path: Deferred CheckOverrun in ReadRange to the final range_end and OnFinish paths. - Async Resume Path: Deferred CheckOverrun in AsyncReaderConnectionResume to the final status.ok, satisfied, and stop paths. - Updated all unit tests in object_read_streambuf_test.cc, read_range_test.cc, and reader_connection_resume_test.cc to drive the streams to completion (EOF/Close) before verifying the logs.
- Converted the logged_warning_ check-and-set in ObjectReadStreambuf to a thread-safe atomic exchange (!logged_warning_.exchange(true)) to prevent any potential check-then-act race conditions. - Added explicit static_cast<std::size_t> casts to requested_length_ in both ReadRange and AsyncReaderConnectionResume overrun logging statements. This prevents signed/unsigned mixing arithmetic and conforms perfectly to strict compiler sign-comparison flags. - Documented and verified architectural correctness of total_received_bytes_ relative counting (essential for ranged reads starting at non-zero offsets) and the >0 read_limit check (mandatory for GCS proto3 default semantics).
…isfy GCB checkers
| std::move(*response->mutable_read_handle()); | ||
| } | ||
| auto copy = it->active_ranges; | ||
| // Release the lock while notifying the ranges. The notifications may trigger |
There was a problem hiding this comment.
nit: move this comment before lk.unlock()?
|
|
||
| auto range = std::make_shared<ReadRange>(p.start, p.length, hash_function, | ||
| std::move(hash_validator)); | ||
| absl::optional<std::int64_t> limit; |
There was a problem hiding this comment.
nit: pls add a comment about how limit is calculated for readers who are unfamiliar with GCS range request behaviors.
| std::string object_name_; | ||
| std::size_t received_bytes_ = 0; | ||
| bool is_transcoded_ = false; | ||
| std::atomic<bool> logged_warning_{false}; |
There was a problem hiding this comment.
Why use std::atomic on just this member? If concurrent access is possible, this class has data races on its other members.
| CheckOverrun(); | ||
| auto result = std::move(*hash_validator_).Finish(hash_function_->Finish()); | ||
| if (result.is_mismatch) { | ||
| bool transcoded_download = |
There was a problem hiding this comment.
nit: add a comment explaining why we're performing this check. Same comment for this change under reader_connection_resume.cc.
| absl::optional<std::int64_t> requested_length_; | ||
| std::string bucket_name_; | ||
| std::string object_name_; | ||
| std::size_t received_bytes_ = 0; |
There was a problem hiding this comment.
Can/should received_bytes_ be std::int64_t similar to total_received_bytes_ under reader_connection_resume.h? This might reduce some of the casting in the implementation.
|
Overall looks good - added some minor comments. |
This commit implements overrun logging, satisfied-resume prevention, and server-side transcoding bypass for both the synchronous and asynchronous GCS download paths.
Synchronous Path:
Asynchronous Path (gRPC BiDi Stream & Streaming Reader):