feat(sidecar): enable telemetry for stats#2175
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: fd8554e | Docs | Datadog PR Page | Give us feedback! |
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
8fcfab9 to
94ad5e8
Compare
1f87e19 to
4c38bb8
Compare
4c38bb8 to
6413c5f
Compare
94ad5e8 to
5706ea9
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c38bb8b78
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let instance_id = InstanceId { | ||
| session_id: runtime_id.to_owned(), | ||
| runtime_id: runtime_id.to_owned(), |
There was a problem hiding this comment.
Use a real runtime id when seeding stats telemetry
When the first IPC fallback span for a service/env arrives before any telemetry client exists, add_span_to_concentrator passes the connection session id as this runtime_id, so this block creates a cached telemetry client whose InstanceId.runtime_id is also the session id. TelemetryCachedClient::new copies that field into builder.runtime_id, and the client is then reused by service/env, so subsequent normal telemetry for the app reports the wrong runtime id. Avoid creating a new telemetry client from this fallback without a real InstanceId, or pass the actual runtime id through the IPC path.
Useful? React with 👍 / 👎.
| .map(|s| { | ||
| ( | ||
| make_exporter(s, s.endpoint.clone(), Duration::from_secs(10)), | ||
| make_exporter(s, s.endpoint.clone(), Duration::from_secs(10), None), |
There was a problem hiding this comment.
Pass telemetry through explicit stats flushes
When ddog_sidecar_flush drains stats via flush(... traces_and_stats: true ...) before the periodic loop runs, this exporter is built with None, so StatsExporter::send cannot emit the datadog.tracer.stats.collapsed_spans telemetry for collapsed spans in that flush. The buckets are drained by this send, so the periodic exporter that does have a worker handle never gets another chance to report them; the flush path should use the concentrator's telemetry worker as well.
Useful? React with 👍 / 👎.
|
|
||
| let exporter = make_exporter( | ||
| &state, | ||
| state.endpoint.clone(), | ||
| flush_interval, |
There was a problem hiding this comment.
Refresh stats telemetry after client stops
This binds the flush loop to the telemetry worker that existed when the concentrator was created, but telemetry clients are removed on LifecycleAction::Stop while SHM concentrators are global and can remain active until an idle flush removes them. If another runtime for the same service/env continues using the existing concentrator after a Stop/start cycle, get_or_create_concentrator returns the old state and collapsed-span points keep going to the stopped worker instead of the newly created telemetry client; look up or refresh the worker when flushing rather than capturing it for the concentrator lifetime.
Useful? React with 👍 / 👎.
|
|
||
| let exporter = make_exporter( | ||
| &state, | ||
| state.endpoint.clone(), | ||
| flush_interval, |
There was a problem hiding this comment.
Share one collapsed-spans metric context
When one service/env has multiple active stats concentrators, such as different versions because the concentrator key includes version, each per-concentrator exporter registers a fresh datadog.tracer.stats.collapsed_spans context on the same telemetry worker. The telemetry buckets aggregate by ContextKey, not by metric name/tags, so collapsed-span counts from those exporters are flushed as separate identical series instead of a single count; cache the metric context per worker and share it across exporters.
Useful? React with 👍 / 👎.
| let trace_config = session.get_trace_config(); | ||
| let runtime_metadata = RuntimeMetadata::new( | ||
| trace_config.language.clone(), | ||
| trace_config.language_version.clone(), | ||
| trace_config.tracer_version.clone(), | ||
| ); | ||
| drop(trace_config); |
There was a problem hiding this comment.
| let trace_config = session.get_trace_config(); | |
| let runtime_metadata = RuntimeMetadata::new( | |
| trace_config.language.clone(), | |
| trace_config.language_version.clone(), | |
| trace_config.tracer_version.clone(), | |
| ); | |
| drop(trace_config); | |
| let runtime_metadata = { | |
| let trace_config = session.get_trace_config(); | |
| RuntimeMetadata::new( | |
| trace_config.language.clone(), | |
| trace_config.language_version.clone(), | |
| trace_config.tracer_version.clone(), | |
| ) | |
| }; |
5706ea9 to
dae98bf
Compare
6413c5f to
9d522ee
Compare
| || {session | ||
| .session_config | ||
| .lock_or_panic() | ||
| .as_ref() | ||
| .cloned() | ||
| .unwrap_or_else(|| { | ||
| warn!("Session telemetry config unavailable for env={env} version={version} service={service_name}; telemetry disabled in stats"); | ||
| Config::default() | ||
| }) | ||
| }, |
There was a problem hiding this comment.
| || {session | |
| .session_config | |
| .lock_or_panic() | |
| .as_ref() | |
| .cloned() | |
| .unwrap_or_else(|| { | |
| warn!("Session telemetry config unavailable for env={env} version={version} service={service_name}; telemetry disabled in stats"); | |
| Config::default() | |
| }) | |
| }, | |
| || { | |
| session | |
| .session_config | |
| .lock_or_panic() | |
| .as_ref() | |
| .cloned() | |
| .unwrap_or_else(|| { | |
| warn!( | |
| "Session telemetry config unavailable for env={env} \ | |
| version={version} service={service_name}; telemetry \ | |
| disabled in stats" | |
| ); | |
| Config::default() | |
| }) | |
| }, |
There was a problem hiding this comment.
I was wondering why cargo fmt didn't format this part and according to claude:
when call args contain complex closures and rustfmt can't find valid layout within width, it falls back to leaving the expression untouched rather than producing malformed output.
with that format, cargo fmt works again.
(btw I checked and escaped newlines in strings are also un-indented so the log still looks like before)
9d522ee to
fd8554e
Compare
dae98bf to
3a03e6a
Compare

What does this PR do?
Enable span concentrator telemetry in sidecar.