Skip to content

Expand attribute value type to support complex values everywhere and cleanup surrounding code#5266

Open
DylanRussell wants to merge 34 commits into
open-telemetry:mainfrom
DylanRussell:extended_attributes
Open

Expand attribute value type to support complex values everywhere and cleanup surrounding code#5266
DylanRussell wants to merge 34 commits into
open-telemetry:mainfrom
DylanRussell:extended_attributes

Conversation

@DylanRussell

@DylanRussell DylanRussell commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Description

This PR updates the code to support complex values everywhere as described in this OTEP -- basically it adds support for None type, heterogeneous arrays, and maps as attribute values.

I cleaned up the surrounding code, including refactoring BoundedAttributes.. this was in part possible because we no longer have to support 2 sets of attribute values.

BoundedAttributes is used for span attributes (and span event attributes and span link attributes), log attributes, instrumentation scope attributes, and resource attributes, but i think it's completely optional for metric points -- all the various metric point classes just accept a generic attributes value, and I think it's up to instrumentations to decide on whether to use it or not.. I don't see any guidance requiring people do it, and I don't see any usages of BoundedAttributes in contrib.

For the bytes type I preserve the existing behavior by decoding it to a string via utf-8. If that fails, then I just pass it through as bytes, which is one of the new extended attribute fields.. It is valid to pass non-utf-8 bytes as an attribute value now.

I updated the encoding of None so that it is always encoded as an empty AnyValue.. I believe this is in-line with the spec: open-telemetry/opentelemetry-specification#4392 - previously we would do this for Arrays but not Maps and not when None is encoded as a primitive..

I added a warning log when we cast types to str() as a fallback (#4808 has some context on why we do that) -- I think instrumentations should be passing a well formed AnyValue type into set_attribute, we don't know the best way to cast these unknown types..

One thing that's missing from the spec is if leaf nodes should count towards the max attribute total count, I think that is the intent but it was not clearly documented (see #4587 and #4587) -- I can clarify and clean that up in a follow up PR though, as this one is already pretty big..

The failing public symbols check is due to the expanded attribute value, it's an additive change..

Type of change

Please delete options that are not relevant.

  • [ x] New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Unit tests

Does This PR Require a Contrib Repo Change?

Yes I had to submit open-telemetry/opentelemetry-python-contrib#4646 -- if someone attempts to use an old (anything less than the next release of) opentelemetry-instrumentation-logging with the next release of opentelemetry-api it will not work.. I'm confused about what to do to resolve that issue though..

Checklist:

  • [ x] Followed the style guidelines of this project
  • [ x] Changelogs have been updated
  • [ x] Unit tests have been added
  • [ x] Documentation has been updated

@DylanRussell DylanRussell requested a review from a team as a code owner June 1, 2026 20:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands OpenTelemetry Python’s attribute value model across the API, SDK, and OTLP exporters to support “complex” values everywhere (e.g., None, heterogeneous arrays, and maps), aligning behavior with the referenced OTEP and updating surrounding utilities/tests accordingly.

Changes:

  • Broadens AnyValue/AttributeValue/Attributes typing and updates public API surfaces (tracing/logs/events/metrics) to accept complex attribute values.
  • Refactors attribute cleaning/storage (BoundedAttributes) and metric attribute-keying (_hash_attributes) to handle complex values consistently.
  • Updates OTLP proto/json encoding to represent None as an empty AnyValue and adjusts unit tests/benchmarks for the new semantics.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
opentelemetry-sdk/tests/trace/test_trace.py Updates span attribute/event tests for complex values and bytes behavior.
opentelemetry-sdk/tests/resources/test_resources.py Updates resource attribute validation tests for new cleaning/stringify behavior.
opentelemetry-sdk/tests/metrics/test_view_instrument_match.py Updates metrics tests to use _hash_attributes aggregation keys.
opentelemetry-sdk/tests/metrics/test_measurement_consumer.py Adjusts a concurrency test to pass attributes explicitly.
opentelemetry-sdk/src/opentelemetry/sdk/util/instrumentation.py Switches instrumentation scope attributes typing to Attributes.
opentelemetry-sdk/src/opentelemetry/sdk/resources/init.py Aligns resource attribute typing/imports with new Attributes definition.
opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/_view_instrument_match.py Introduces _hash_attributes and deep-copies measurement attributes prior to aggregation.
opentelemetry-sdk/src/opentelemetry/sdk/environment_variables/init.py Clarifies docs for attribute value length limits with string/bytes focus.
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/_exceptions.py Updates exception-attribute merging to the unified Attributes type.
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/init.py Updates logging translation checks/warnings for AnyValue casting.
opentelemetry-sdk/src/opentelemetry/sdk/_events/init.py Updates event logger provider attribute typing to Attributes.
opentelemetry-sdk/benchmarks/metrics/test_benchmark_metrics_histogram.py Expands benchmarks to cover complex/mapping/array attribute shapes.
opentelemetry-api/tests/logs/test_proxy.py Updates logs proxy tests to Attributes.
opentelemetry-api/tests/events/test_proxy_event.py Updates events proxy tests to Attributes.
opentelemetry-api/tests/attributes/test_attributes.py Reworks attribute tests around new _clean_attribute_value/BoundedAttributes.
opentelemetry-api/src/opentelemetry/util/types.py Redefines AnyValue/AttributeValue and widens Attributes accordingly; removes _ExtendedAttributes.
opentelemetry-api/src/opentelemetry/trace/span.py Updates span API type hints to use collections.abc.Mapping and types.Attributes in NoOp implementation.
opentelemetry-api/src/opentelemetry/attributes/init.py Major refactor: new recursive cleaner and BoundedAttributes as a dict subclass.
opentelemetry-api/src/opentelemetry/_logs/_internal/init.py Updates log API typing from _ExtendedAttributes to Attributes.
opentelemetry-api/src/opentelemetry/_events/init.py Updates event API typing from _ExtendedAttributes to Attributes.
exporter/opentelemetry-exporter-otlp-proto-grpc/benchmarks/test_benchmark_trace_exporter.py Updates patch target to TraceServiceStub.
exporter/opentelemetry-exporter-otlp-proto-common/tests/test_log_encoder.py Refactors OTLP proto log encoder tests around new null/AnyValue behavior.
exporter/opentelemetry-exporter-otlp-proto-common/tests/test_attribute_encoder.py Adjusts encoder failure tests to use an unencodable object.
exporter/opentelemetry-exporter-otlp-proto-common/src/opentelemetry/exporter/otlp/proto/common/_internal/_log_encoder/init.py Stops using allow_null; encodes null as empty AnyValue.
exporter/opentelemetry-exporter-otlp-proto-common/src/opentelemetry/exporter/otlp/proto/common/_internal/init.py Simplifies _encode_value and _encode_attributes around null-as-empty-AnyValue.
exporter/opentelemetry-exporter-otlp-json-common/tests/test_log_encoder.py Updates JSON log encoder tests for null-as-empty-AnyValue body.
exporter/opentelemetry-exporter-otlp-json-common/tests/test_common_encoder.py Updates JSON common encoder tests for null and unencodable values.
exporter/opentelemetry-exporter-otlp-json-common/src/opentelemetry/exporter/otlp/json/common/_internal/_log_encoder/init.py Removes allow_null usage in JSON log encoding.
exporter/opentelemetry-exporter-otlp-json-common/src/opentelemetry/exporter/otlp/json/common/_internal/init.py Simplifies _encode_value and _encode_attributes around null-as-empty-AnyValue.
docs/conf.py Updates Sphinx type-hint resolution aliasing; now also touches metrics internals.
.changelog/5266.added Adds changelog entry for extended/complex attribute support across packages.
Comments suppressed due to low confidence (1)

opentelemetry-api/src/opentelemetry/trace/span.py:96

  • The note in Span.set_attributes says the behavior of None value attributes is undefined, but this PR explicitly adds support for None/null attribute values across the API/SDK. Please update the docstring to reflect the new supported semantics (even if None remains discouraged).
    def set_attributes(
        self, attributes: Mapping[str, types.AttributeValue]
    ) -> None:
        """Sets Attributes.

        Sets Attributes with the key and value passed as arguments dict.

        Note: The behavior of `None` value attributes is undefined, and hence
        strongly discouraged. It is also preferred to set attributes at span
        creation, instead of calling this method later since samplers can only
        consider information already present during span creation.
        """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread opentelemetry-api/src/opentelemetry/attributes/__init__.py Outdated
Comment thread opentelemetry-api/src/opentelemetry/attributes/__init__.py Outdated
Comment thread opentelemetry-api/src/opentelemetry/attributes/__init__.py
Comment on lines +107 to +111
attributes = {}
if measurement.attributes:
# Deepcopy to prevent the user from modifying the attributes after the
# measurement has been consumed.
attributes = copy.deepcopy(measurement.attributes)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using BoundAttributes for metrics appears to be completely optional, so we can't rely on that to ensure attribute values (and the attributes Map as a whole) are immutable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attributes is always just a map which can be mutated.... I don't see how a shallow copy is better than a deep copy

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do share this concern though.. We would save a lot of copying by just using the mutable types passed through

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this code preexisting anyway?

@DylanRussell DylanRussell Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned up this code a tiny bit..

I'm pretty sure this was the intent of the previous code (because there was a simple test that failed) which was:

 elif measurement.attributes is not None:
            attributes = dict(measurement.attributes)

But that doesn't copy lists/dicts

Comment thread docs/conf.py Outdated
Comment thread opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/__init__.py

@MikeGoldsmith MikeGoldsmith left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I am concerned this is a fairly big change so we need to make sure it's done safely.

# For more details, refer to the OTel specification:
# https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#type-any
AnyValue = (
AnyValue = AttributeValue = (

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making AttributeValue an AnyValue is greatly increasing the available types. Should this be considered a breaking change? Anyone who used isinstance could see very different results. I don't disagree with the change, but think it needs to be more loudly documented.

@DylanRussell DylanRussell Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.. I've now added a clear description of how it's being expanded to the changelog.. Is there something else you think we should do ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be considered a breaking change if we're expanding the range of acceptable types.

Comment thread opentelemetry-api/src/opentelemetry/attributes/__init__.py Outdated
Comment thread opentelemetry-api/src/opentelemetry/attributes/__init__.py Outdated
)
Attributes = Mapping[str, AttributeValue] | None
Attributes = Mapping[str, AnyValue] | None
# Not sure of the purpose of this type.. this is likely to confuse people..

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not used, should we remove it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of removing it.. it's technically a breaking change because there's no _ prefix so it's public, but I don't see the use case for it. I doubt people are using it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this is used for logs. We should leave it around for backwards compatability.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's only usage is in a very stale .pyi file that should be deleted.. No usages in contrib.. Also the type itself doesn't make sense. I vote we delete it

Comment thread docs/conf.py Outdated

# Provide AnyValue in opentelemetry.attributes module's namespace so the
# "AnyValue" forward reference in opentelemetry.util.types._ExtendedAttributes
# TODO: Is there a better way to do this ? I have to do this re-import thing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this comment about, can we remove it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raajheshkannaa do you know a better way to resolve this issue ? I tried copying this approach but I'm still getting failures and it's a pain

values=[_encode_key_value(str(k), v) for k, v in value.items()]
)
)
raise Exception(f"Invalid type {type(value)} of value {value}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably convert to string here and potentially log a warning:

Any other values not listed above SHOULD be converted to AnyValue's string_value field if the source data can be serialized to a string (can be stringified) using toString() or stringify functions available in programming languages.

If the source data cannot be serialized to a string then the value SHOULD be converted AnyValue's bytes_value field by serializing it into a byte sequence by any means available.

If the source data cannot be serialized neither to a string nor to a byte sequence then it SHOULD by converted to an empty AnyValue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind this should be handled earlier



_logger = logging.getLogger(__name__)
class _DuplicateFilter(logging.Filter):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like duplicating this duplicate filter here, and having to add it to the logger here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leave it out ? These logs could be noisy. I can't import the other DuplicateFilter without taking a dependency on opentelemetry-sdk. There isn't a shared code package I can use either.

"""
if isinstance(value, (type(None), bool, int, float)):
return value
if isinstance(value, bytes):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what the spec says should happen? In my view, if bytes are passed in, we should leave them as-is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec doesn't say to do it. The current code converts bytes to str this way though, if we change it could break people.

I suppose an alternative is to do it like this temporarily (and log a warning), saying that in the future we will leave the field as bytes and then switch over after a few releases.. WDYT?

# converting to string raises.
) != _InvalidAttributeValue.INVALID_VALUE:
cleaned_mapping[key] = cleaned_value
return MappingProxyType(cleaned_mapping)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is MappingProxyType needed here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's needed TBH.. It's immutable.. Since Mappings/Sequences can be mutable, do we want to allow people to mutate these outside of __setitem__ and have the attribute value be mutated? My thinking was no because that bypasses the validation / cleaning of attributes.. But the user would have to reach into this data structure and get the Mapping/Sequence and then mutate it, and if they do that maybe it's intentional.. This also introduces some overhead obviously because we make a copy.. WDYT?

Comment on lines +114 to +117
try:
return str(value)
except Exception:
raise TypeError(
f"Invalid type {type(value).__name__} for attribute value. "
f"Expected one of {[_type_name(valid_type) for valid_type in _VALID_ANY_VALUE_TYPES]} or a "
"sequence of those types",
)


def _clean_extended_attribute(
key: str, value: types.AnyValue, max_len: int | None
) -> types.AnyValue:
"""Checks if attribute value is valid and cleans it if required.

The function returns the cleaned value or None if the value is not valid.
except Exception: # pylint: disable=broad-exception-caught
return _InvalidAttributeValue.INVALID_VALUE

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec states that if we can't convert to string, we should fallback to an empty any value

If the source data cannot be serialized neither to a string nor to a byte sequence then it SHOULD by converted to an empty AnyValue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so in accordance with that I switched the logic to fallback to None which will ultimately get converted to an empty AnyValue..

type(value),
)
try:
return str(value)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably define a helper type here, and use an isinstance() check before serializing.

@runtime_checkable
class Stringable(Protocol):
    def __str__(self) -> str: ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That didn't work.. a float for example doesn't have it's own __str__ method, but can be casted to a string somehow.. I think the default object str must know how to produce a valid string

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't float be handled earlier up though, this is the fallback case right?

)
Attributes = Mapping[str, AttributeValue] | None
Attributes = Mapping[str, AnyValue] | None
# Not sure of the purpose of this type.. this is likely to confuse people..

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this is used for logs. We should leave it around for backwards compatability.

# For more background, see: https://github.com/open-telemetry/opentelemetry-python/pull/4216
if not record.args and not isinstance(record.msg, str):
# if record.msg is not a value we can export, cast it to string
if not isinstance(record.msg, _VALID_ANY_VALUE_TYPES):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing the behavior of logging bodies here? I'd prefer to scope the changes to just attributes.

if isinstance(value, Sequence):
return tuple(_hash_attributes(v) for v in value)
if isinstance(value, Mapping):
return tuple((k, _hash_attributes(value[k])) for k in sorted(value))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we handle the case where value is not comparable?

_encode_key_value(str(k), v, allow_null=allow_null)
for k, v in value.items()
]
values=[_encode_key_value(str(k), v) for k, v in value.items()]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your other comment, we shouldn't massage any values here right ??

So a custom processor can send bad data here.. In this PR we are cleaning attribute values in the SDK, but obviously that isn't the only path that can feed into the OTLP exporters.. It is the only path we have strict control over...

@herin049 herin049 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is a very significant change impacting all signal types, I'd recommend breaking this into 3 PRs:

  1. Encoding logic for converting from the new AnyValue to Protobuf/ProtoJSON object with extensive testing.
  2. Add support for logs/traces
  3. Add support for metrics (optional)

Note that the original OTEP mentions that SDKs MAY perform 3. so it's best to do this last.

@github-project-automation github-project-automation Bot moved this to Reviewed PRs that need fixes in Python PR digest Jun 10, 2026
@DylanRussell

Copy link
Copy Markdown
Contributor Author

I'm open to breaking up the PR a bit and taking some stuff out, but i'd like to remove the old way of encoding attributes in the same PR as we add the new way, and move all attributes to the new way..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Reviewed PRs that need fixes

Development

Successfully merging this pull request may close these issues.

5 participants