Expand attribute value type to support complex values everywhere and cleanup surrounding code#5266
Expand attribute value type to support complex values everywhere and cleanup surrounding code#5266DylanRussell wants to merge 34 commits into
attribute value type to support complex values everywhere and cleanup surrounding code#5266Conversation
There was a problem hiding this comment.
Pull request overview
This PR expands OpenTelemetry Python’s attribute value model across the API, SDK, and OTLP exporters to support “complex” values everywhere (e.g., None, heterogeneous arrays, and maps), aligning behavior with the referenced OTEP and updating surrounding utilities/tests accordingly.
Changes:
- Broadens
AnyValue/AttributeValue/Attributestyping and updates public API surfaces (tracing/logs/events/metrics) to accept complex attribute values. - Refactors attribute cleaning/storage (
BoundedAttributes) and metric attribute-keying (_hash_attributes) to handle complex values consistently. - Updates OTLP proto/json encoding to represent
Noneas an emptyAnyValueand adjusts unit tests/benchmarks for the new semantics.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| opentelemetry-sdk/tests/trace/test_trace.py | Updates span attribute/event tests for complex values and bytes behavior. |
| opentelemetry-sdk/tests/resources/test_resources.py | Updates resource attribute validation tests for new cleaning/stringify behavior. |
| opentelemetry-sdk/tests/metrics/test_view_instrument_match.py | Updates metrics tests to use _hash_attributes aggregation keys. |
| opentelemetry-sdk/tests/metrics/test_measurement_consumer.py | Adjusts a concurrency test to pass attributes explicitly. |
| opentelemetry-sdk/src/opentelemetry/sdk/util/instrumentation.py | Switches instrumentation scope attributes typing to Attributes. |
| opentelemetry-sdk/src/opentelemetry/sdk/resources/init.py | Aligns resource attribute typing/imports with new Attributes definition. |
| opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/_view_instrument_match.py | Introduces _hash_attributes and deep-copies measurement attributes prior to aggregation. |
| opentelemetry-sdk/src/opentelemetry/sdk/environment_variables/init.py | Clarifies docs for attribute value length limits with string/bytes focus. |
| opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/_exceptions.py | Updates exception-attribute merging to the unified Attributes type. |
| opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/init.py | Updates logging translation checks/warnings for AnyValue casting. |
| opentelemetry-sdk/src/opentelemetry/sdk/_events/init.py | Updates event logger provider attribute typing to Attributes. |
| opentelemetry-sdk/benchmarks/metrics/test_benchmark_metrics_histogram.py | Expands benchmarks to cover complex/mapping/array attribute shapes. |
| opentelemetry-api/tests/logs/test_proxy.py | Updates logs proxy tests to Attributes. |
| opentelemetry-api/tests/events/test_proxy_event.py | Updates events proxy tests to Attributes. |
| opentelemetry-api/tests/attributes/test_attributes.py | Reworks attribute tests around new _clean_attribute_value/BoundedAttributes. |
| opentelemetry-api/src/opentelemetry/util/types.py | Redefines AnyValue/AttributeValue and widens Attributes accordingly; removes _ExtendedAttributes. |
| opentelemetry-api/src/opentelemetry/trace/span.py | Updates span API type hints to use collections.abc.Mapping and types.Attributes in NoOp implementation. |
| opentelemetry-api/src/opentelemetry/attributes/init.py | Major refactor: new recursive cleaner and BoundedAttributes as a dict subclass. |
| opentelemetry-api/src/opentelemetry/_logs/_internal/init.py | Updates log API typing from _ExtendedAttributes to Attributes. |
| opentelemetry-api/src/opentelemetry/_events/init.py | Updates event API typing from _ExtendedAttributes to Attributes. |
| exporter/opentelemetry-exporter-otlp-proto-grpc/benchmarks/test_benchmark_trace_exporter.py | Updates patch target to TraceServiceStub. |
| exporter/opentelemetry-exporter-otlp-proto-common/tests/test_log_encoder.py | Refactors OTLP proto log encoder tests around new null/AnyValue behavior. |
| exporter/opentelemetry-exporter-otlp-proto-common/tests/test_attribute_encoder.py | Adjusts encoder failure tests to use an unencodable object. |
| exporter/opentelemetry-exporter-otlp-proto-common/src/opentelemetry/exporter/otlp/proto/common/_internal/_log_encoder/init.py | Stops using allow_null; encodes null as empty AnyValue. |
| exporter/opentelemetry-exporter-otlp-proto-common/src/opentelemetry/exporter/otlp/proto/common/_internal/init.py | Simplifies _encode_value and _encode_attributes around null-as-empty-AnyValue. |
| exporter/opentelemetry-exporter-otlp-json-common/tests/test_log_encoder.py | Updates JSON log encoder tests for null-as-empty-AnyValue body. |
| exporter/opentelemetry-exporter-otlp-json-common/tests/test_common_encoder.py | Updates JSON common encoder tests for null and unencodable values. |
| exporter/opentelemetry-exporter-otlp-json-common/src/opentelemetry/exporter/otlp/json/common/_internal/_log_encoder/init.py | Removes allow_null usage in JSON log encoding. |
| exporter/opentelemetry-exporter-otlp-json-common/src/opentelemetry/exporter/otlp/json/common/_internal/init.py | Simplifies _encode_value and _encode_attributes around null-as-empty-AnyValue. |
| docs/conf.py | Updates Sphinx type-hint resolution aliasing; now also touches metrics internals. |
| .changelog/5266.added | Adds changelog entry for extended/complex attribute support across packages. |
Comments suppressed due to low confidence (1)
opentelemetry-api/src/opentelemetry/trace/span.py:96
- The note in
Span.set_attributessays the behavior ofNonevalue attributes is undefined, but this PR explicitly adds support forNone/null attribute values across the API/SDK. Please update the docstring to reflect the new supported semantics (even ifNoneremains discouraged).
def set_attributes(
self, attributes: Mapping[str, types.AttributeValue]
) -> None:
"""Sets Attributes.
Sets Attributes with the key and value passed as arguments dict.
Note: The behavior of `None` value attributes is undefined, and hence
strongly discouraged. It is also preferred to set attributes at span
creation, instead of calling this method later since samplers can only
consider information already present during span creation.
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| attributes = {} | ||
| if measurement.attributes: | ||
| # Deepcopy to prevent the user from modifying the attributes after the | ||
| # measurement has been consumed. | ||
| attributes = copy.deepcopy(measurement.attributes) |
There was a problem hiding this comment.
Using BoundAttributes for metrics appears to be completely optional, so we can't rely on that to ensure attribute values (and the attributes Map as a whole) are immutable
There was a problem hiding this comment.
attributes is always just a map which can be mutated.... I don't see how a shallow copy is better than a deep copy
There was a problem hiding this comment.
I do share this concern though.. We would save a lot of copying by just using the mutable types passed through
There was a problem hiding this comment.
Wasn't this code preexisting anyway?
There was a problem hiding this comment.
I cleaned up this code a tiny bit..
I'm pretty sure this was the intent of the previous code (because there was a simple test that failed) which was:
elif measurement.attributes is not None:
attributes = dict(measurement.attributes)
But that doesn't copy lists/dicts
MikeGoldsmith
left a comment
There was a problem hiding this comment.
Overall looks good. I am concerned this is a fairly big change so we need to make sure it's done safely.
| # For more details, refer to the OTel specification: | ||
| # https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#type-any | ||
| AnyValue = ( | ||
| AnyValue = AttributeValue = ( |
There was a problem hiding this comment.
Making AttributeValue an AnyValue is greatly increasing the available types. Should this be considered a breaking change? Anyone who used isinstance could see very different results. I don't disagree with the change, but think it needs to be more loudly documented.
There was a problem hiding this comment.
Ack.. I've now added a clear description of how it's being expanded to the changelog.. Is there something else you think we should do ?
There was a problem hiding this comment.
This shouldn't be considered a breaking change if we're expanding the range of acceptable types.
| ) | ||
| Attributes = Mapping[str, AttributeValue] | None | ||
| Attributes = Mapping[str, AnyValue] | None | ||
| # Not sure of the purpose of this type.. this is likely to confuse people.. |
There was a problem hiding this comment.
If it's not used, should we remove it?
There was a problem hiding this comment.
I am in favor of removing it.. it's technically a breaking change because there's no _ prefix so it's public, but I don't see the use case for it. I doubt people are using it
There was a problem hiding this comment.
It seems like this is used for logs. We should leave it around for backwards compatability.
There was a problem hiding this comment.
Actually it's only usage is in a very stale .pyi file that should be deleted.. No usages in contrib.. Also the type itself doesn't make sense. I vote we delete it
|
|
||
| # Provide AnyValue in opentelemetry.attributes module's namespace so the | ||
| # "AnyValue" forward reference in opentelemetry.util.types._ExtendedAttributes | ||
| # TODO: Is there a better way to do this ? I have to do this re-import thing |
There was a problem hiding this comment.
What is this comment about, can we remove it?
There was a problem hiding this comment.
@raajheshkannaa do you know a better way to resolve this issue ? I tried copying this approach but I'm still getting failures and it's a pain
| values=[_encode_key_value(str(k), v) for k, v in value.items()] | ||
| ) | ||
| ) | ||
| raise Exception(f"Invalid type {type(value)} of value {value}") |
There was a problem hiding this comment.
We should probably convert to string here and potentially log a warning:
Any other values not listed above SHOULD be converted to AnyValue's string_value field if the source data can be serialized to a string (can be stringified) using toString() or stringify functions available in programming languages.
If the source data cannot be serialized to a string then the value SHOULD be converted AnyValue's bytes_value field by serializing it into a byte sequence by any means available.
If the source data cannot be serialized neither to a string nor to a byte sequence then it SHOULD by converted to an empty AnyValue.
There was a problem hiding this comment.
Never mind this should be handled earlier
|
|
||
|
|
||
| _logger = logging.getLogger(__name__) | ||
| class _DuplicateFilter(logging.Filter): |
There was a problem hiding this comment.
I don't like duplicating this duplicate filter here, and having to add it to the logger here.
There was a problem hiding this comment.
Just leave it out ? These logs could be noisy. I can't import the other DuplicateFilter without taking a dependency on opentelemetry-sdk. There isn't a shared code package I can use either.
| """ | ||
| if isinstance(value, (type(None), bool, int, float)): | ||
| return value | ||
| if isinstance(value, bytes): |
There was a problem hiding this comment.
Is this what the spec says should happen? In my view, if bytes are passed in, we should leave them as-is.
There was a problem hiding this comment.
The spec doesn't say to do it. The current code converts bytes to str this way though, if we change it could break people.
I suppose an alternative is to do it like this temporarily (and log a warning), saying that in the future we will leave the field as bytes and then switch over after a few releases.. WDYT?
| # converting to string raises. | ||
| ) != _InvalidAttributeValue.INVALID_VALUE: | ||
| cleaned_mapping[key] = cleaned_value | ||
| return MappingProxyType(cleaned_mapping) |
There was a problem hiding this comment.
Why is MappingProxyType needed here?
There was a problem hiding this comment.
Not sure if it's needed TBH.. It's immutable.. Since Mappings/Sequences can be mutable, do we want to allow people to mutate these outside of __setitem__ and have the attribute value be mutated? My thinking was no because that bypasses the validation / cleaning of attributes.. But the user would have to reach into this data structure and get the Mapping/Sequence and then mutate it, and if they do that maybe it's intentional.. This also introduces some overhead obviously because we make a copy.. WDYT?
| try: | ||
| return str(value) | ||
| except Exception: | ||
| raise TypeError( | ||
| f"Invalid type {type(value).__name__} for attribute value. " | ||
| f"Expected one of {[_type_name(valid_type) for valid_type in _VALID_ANY_VALUE_TYPES]} or a " | ||
| "sequence of those types", | ||
| ) | ||
|
|
||
|
|
||
| def _clean_extended_attribute( | ||
| key: str, value: types.AnyValue, max_len: int | None | ||
| ) -> types.AnyValue: | ||
| """Checks if attribute value is valid and cleans it if required. | ||
|
|
||
| The function returns the cleaned value or None if the value is not valid. | ||
| except Exception: # pylint: disable=broad-exception-caught | ||
| return _InvalidAttributeValue.INVALID_VALUE |
There was a problem hiding this comment.
The spec states that if we can't convert to string, we should fallback to an empty any value
If the source data cannot be serialized neither to a string nor to a byte sequence then it SHOULD by converted to an empty AnyValue.
There was a problem hiding this comment.
Ok so in accordance with that I switched the logic to fallback to None which will ultimately get converted to an empty AnyValue..
| type(value), | ||
| ) | ||
| try: | ||
| return str(value) |
There was a problem hiding this comment.
We should probably define a helper type here, and use an isinstance() check before serializing.
@runtime_checkable
class Stringable(Protocol):
def __str__(self) -> str: ...There was a problem hiding this comment.
WDYT of type(key).__str__ is object.__str__: ? https://stackoverflow.com/questions/19628421/how-to-check-if-str-is-implemented-by-an-object
There was a problem hiding this comment.
That didn't work.. a float for example doesn't have it's own __str__ method, but can be casted to a string somehow.. I think the default object str must know how to produce a valid string
There was a problem hiding this comment.
Wouldn't float be handled earlier up though, this is the fallback case right?
| ) | ||
| Attributes = Mapping[str, AttributeValue] | None | ||
| Attributes = Mapping[str, AnyValue] | None | ||
| # Not sure of the purpose of this type.. this is likely to confuse people.. |
There was a problem hiding this comment.
It seems like this is used for logs. We should leave it around for backwards compatability.
| # For more background, see: https://github.com/open-telemetry/opentelemetry-python/pull/4216 | ||
| if not record.args and not isinstance(record.msg, str): | ||
| # if record.msg is not a value we can export, cast it to string | ||
| if not isinstance(record.msg, _VALID_ANY_VALUE_TYPES): |
There was a problem hiding this comment.
Why are we changing the behavior of logging bodies here? I'd prefer to scope the changes to just attributes.
| if isinstance(value, Sequence): | ||
| return tuple(_hash_attributes(v) for v in value) | ||
| if isinstance(value, Mapping): | ||
| return tuple((k, _hash_attributes(value[k])) for k in sorted(value)) |
There was a problem hiding this comment.
How do we handle the case where value is not comparable?
| _encode_key_value(str(k), v, allow_null=allow_null) | ||
| for k, v in value.items() | ||
| ] | ||
| values=[_encode_key_value(str(k), v) for k, v in value.items()] |
There was a problem hiding this comment.
I think this implementation is missing the cases described in the spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/attribute-type-mapping.md#associative-arrays-with-unique-keys
There was a problem hiding this comment.
Based on your other comment, we shouldn't massage any values here right ??
So a custom processor can send bad data here.. In this PR we are cleaning attribute values in the SDK, but obviously that isn't the only path that can feed into the OTLP exporters.. It is the only path we have strict control over...
There was a problem hiding this comment.
Given that this is a very significant change impacting all signal types, I'd recommend breaking this into 3 PRs:
- Encoding logic for converting from the new
AnyValueto Protobuf/ProtoJSON object with extensive testing. - Add support for logs/traces
- Add support for metrics (optional)
Note that the original OTEP mentions that SDKs MAY perform 3. so it's best to do this last.
|
I'm open to breaking up the PR a bit and taking some stuff out, but i'd like to remove the old way of encoding attributes in the same PR as we add the new way, and move all attributes to the new way.. |
Description
This PR updates the code to support complex values everywhere as described in this OTEP -- basically it adds support for
Nonetype, heterogeneous arrays, and maps as attribute values.I cleaned up the surrounding code, including refactoring
BoundedAttributes.. this was in part possible because we no longer have to support 2 sets of attribute values.BoundedAttributesis used for span attributes (and span event attributes and span link attributes), log attributes, instrumentation scope attributes, and resource attributes, but i think it's completely optional formetricpoints -- all the various metricpointclasses just accept a genericattributesvalue, and I think it's up to instrumentations to decide on whether to use it or not.. I don't see any guidance requiring people do it, and I don't see any usages ofBoundedAttributesin contrib.For the
bytestype I preserve the existing behavior by decoding it to a string via utf-8. If that fails, then I just pass it through asbytes, which is one of the new extended attribute fields.. It is valid to pass non-utf-8 bytes as an attribute value now.I updated the encoding of
Noneso that it is always encoded as an emptyAnyValue.. I believe this is in-line with the spec: open-telemetry/opentelemetry-specification#4392 - previously we would do this for Arrays but not Maps and not when None is encoded as a primitive..I added a warning log when we cast types to
str()as a fallback (#4808 has some context on why we do that) -- I think instrumentations should be passing a well formed AnyValue type intoset_attribute, we don't know the best way to cast these unknown types..One thing that's missing from the spec is if leaf nodes should count towards the max attribute total count, I think that is the intent but it was not clearly documented (see #4587 and #4587) -- I can clarify and clean that up in a follow up PR though, as this one is already pretty big..
The failing public symbols check is due to the expanded attribute value, it's an additive change..
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Unit tests
Does This PR Require a Contrib Repo Change?
Yes I had to submit open-telemetry/opentelemetry-python-contrib#4646 -- if someone attempts to use an old (anything less than the next release of)
opentelemetry-instrumentation-loggingwith the next release ofopentelemetry-apiit will not work.. I'm confused about what to do to resolve that issue though..Checklist: