Skip to content

feat(storage): add resource span attributes for ACO ( App Centric Observability )#16119

Open
bajajneha27 wants to merge 17 commits into
googleapis:mainfrom
bajajneha27:509338299
Open

feat(storage): add resource span attributes for ACO ( App Centric Observability )#16119
bajajneha27 wants to merge 17 commits into
googleapis:mainfrom
bajajneha27:509338299

Conversation

@bajajneha27

Copy link
Copy Markdown
Contributor

No description provided.

@product-auto-label product-auto-label Bot added the api: storage Issues related to the Cloud Storage API. label May 27, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a private helper method EnrichSpan to populate OpenTelemetry span attributes (gcp.resource.destination.id and gcp.resource.destination.location) using bucket metadata upon successful bucket operations (such as creation, retrieval, updates, and locking). It also adds corresponding unit tests to verify these attributes. The review comments suggest making EnrichSpan static since it does not access member variables, and checking for an uninitialized project number (value 0) to avoid generating invalid resource IDs.

Comment thread google/cloud/storage/internal/tracing_connection.cc Outdated
Comment thread google/cloud/storage/internal/tracing_connection.h Outdated
@codecov

codecov Bot commented May 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.92453% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.20%. Comparing base (fd73bee) to head (52c1762).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
.../cloud/storage/internal/tracing_connection_test.cc 95.81% 8 Missing ⚠️
...oogle/cloud/storage/internal/tracing_connection.cc 99.06% 2 Missing ⚠️
...le/cloud/storage/internal/bucket_metadata_cache.cc 98.27% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main   #16119    +/-   ##
========================================
  Coverage   92.20%   92.20%            
========================================
  Files        2264     2267     +3     
  Lines      208864   209341   +477     
========================================
+ Hits       192579   193033   +454     
- Misses      16285    16308    +23     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bajajneha27 bajajneha27 force-pushed the 509338299 branch 3 times, most recently from b9d9575 to 53127d4 Compare June 3, 2026 08:09
@bajajneha27 bajajneha27 force-pushed the 509338299 branch 2 times, most recently from c978382 to ab9f016 Compare June 9, 2026 10:56
@bajajneha27 bajajneha27 marked this pull request as ready for review June 9, 2026 15:26
@bajajneha27 bajajneha27 requested review from a team as code owners June 9, 2026 15:26
Comment thread google/cloud/storage/internal/tracing_connection.cc Outdated
Comment thread google/cloud/storage/internal/bucket_metadata_cache.cc Outdated
return internal::EndSpan(*span, impl_->GetObjectMetadata(request));
EnrichSpan(*span, request.bucket_name());
auto result = impl_->GetObjectMetadata(request);
MaybeInvalidate(result, request.bucket_name());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaybeInvalidate is called on almost all operations, including object-level operations. If a user requests a non-existent object in a valid bucket, the operation returns 404, and the bucket is evicted from the cache.

We should either call MaybeInvalidate on bucket-level operations where a 404 guaranteed means the bucket is gone, or check status.message() to distinguish bucket-404 from object-404 (this approach is somewhat brittle).

We could also do what Python does in this case by evicting only if the bucket is truly gone: https://github.com/googleapis/google-cloud-python/blob/384724c2d4c955e15274e9824bcdb93c685b79f6/packages/google-cloud-storage/google/cloud/storage/_bucket_metadata_cache.py#L68.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this change once we decide on the background thread discussion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can invalidate the cache only on bucket operations, and not on object operations.
If we want to follow how it's done in Python, we'd need to make extra API call to check the existence of the bucket.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's invalidate the cache only on bucket operations for now. We can revisit this in the future, if we want to be more precise.

}

auto current_options = google::cloud::internal::SaveCurrentOptions();
auto f = std::async(std::launch::async, [this, bucket_name,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::async spawns threads dynamically, which causes the destructor of TracingConnection to block waiting for all background tasks to complete. Is there an alternate way?

Instead of spawning a new thread dynamically for every cache miss via std::async, can TracingConnection manage a single, long-lived background worker thread?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, and that would probably be a better option. But the metadata fetch on cache miss would not happen concurrently in that case which can be acceptable because I think cache misses will be infrequent. WDYT ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we'll keep the current approach of having background threads so that the bucket metadata fetch can happen concurrently for different buckets.

The only concern over here was that the threads should have some sort of deadline / timeout. So, I think that can be taken care of by the retry_policy that we have configured. As soon as retries are exhausted, the thread would end too.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ensure we don't retry in case of permission errors. I think this should already be there but good to check.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR protects against multiple fetches for the same bucket (via in_flight_fetch_). However, there is no global limit on the total number of threads. RPC timeouts and retry policies do not solve the performance problems which come with constant thread creation/destruction.

Let's keep this comment open for now until we are confident with this approach. Will move this discussion offline.

bg_tasks_.end());
}

void TracingConnection::EnrichSpan(opentelemetry::trace::Span& span,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to provide an option to disable this feature.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new option in options.h and kept it enabled by default.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpriti-os is the plan for all clients to keep this behavior on or off by default?

@kalragauri kalragauri requested a review from cpriti-os June 10, 2026 09:39

class BucketMetadataCache {
public:
explicit BucketMetadataCache(std::size_t max_size = 10000)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, there was some discussion to make this value configurable. Let's make sure this value/behavior is consistent across languages.
cc: @cpriti-os

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kalragauri I don't think we need to make this configurable, just a reasonably low memory consumption should be good with a flag to disable it all together. Since this isn't exactly a user facing feature, a lot of options and configurations for internal logic can be confusing for users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the Cloud Storage API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants