Use Set hash in Distinct (2nd attempt)#7175
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7175 +/- ##
=====================================
Coverage 85.4% 85.4%
=====================================
Files 269 271 +2
Lines 24297 24391 +94
=====================================
+ Hits 20762 20843 +81
- Misses 3158 3170 +12
- Partials 377 378 +1
🚀 New features to boost your workflow:
|
MrAlias
left a comment
There was a problem hiding this comment.
We need to consider what is an acceptable level of hash collision here. We may want to use a larger bit size for the hash or look into ways to avoid the collision if we determine the collision rate too high. We may also consider this approach acceptable.
|
For reference, Prometheus has used as similar size hash for their implementation. There haven't been any known issues related to collisions there and it has been out for ~ 10 years. |
|
Some input from the sidelines (and yay on the performance improvements! 🥳):
It looks like collision handling was added to Besides the client library, we have seen some real collisions over the years in the Prometheus server that have all been addressed:
We still use hashes without collision checking in some places, which are hopefully unlikely enough to ever cause a collision. See also prometheus/prometheus#12519 for more discussion on hash collision potential in Prometheus. |
bwplotka
left a comment
There was a problem hiding this comment.
Amazing, thanks! Looks good modulo previous comments around runes/bytes on hash string and unmarshalling questions 👍🏽
a8037e9 to
6bf5e93
Compare
|
@dashpole are we going to look into collision remediation techniques based on the Prometheus groups experience? |
|
Yes, I will look into what was done in prometheus/client_golang#221. |
|
The TL;DR is that handling collisions just means using a I think the biggest difference for the attributes package is that we would need to ensure that |
162f87d to
6ffffec
Compare
Forked from #7175 ``` $ go test -timeout 60s -run=xxxxxMatchNothingxxxxx -test.benchtime=10ms -count 6 -cpu 1 -bench=BenchmarkEquals ./... goos: linux goarch: amd64 pkg: go.opentelemetry.io/otel/attribute cpu: Intel(R) Xeon(R) CPU @ 2.20GHz BenchmarkEquals/Empty 1314884 9.008 ns/op BenchmarkEquals/Empty 1875915 8.461 ns/op BenchmarkEquals/Empty 1461613 7.749 ns/op BenchmarkEquals/Empty 1636912 9.359 ns/op BenchmarkEquals/Empty 1863039 6.355 ns/op BenchmarkEquals/Empty 1789053 6.336 ns/op BenchmarkEquals/1_string_attribute 674168 16.92 ns/op BenchmarkEquals/1_string_attribute 701983 16.42 ns/op BenchmarkEquals/1_string_attribute 692001 16.52 ns/op BenchmarkEquals/1_string_attribute 687970 16.29 ns/op BenchmarkEquals/1_string_attribute 751766 16.58 ns/op BenchmarkEquals/1_string_attribute 703534 16.88 ns/op BenchmarkEquals/10_string_attributes 85400 137.1 ns/op BenchmarkEquals/10_string_attributes 91045 136.1 ns/op BenchmarkEquals/10_string_attributes 90973 150.7 ns/op BenchmarkEquals/10_string_attributes 62877 177.5 ns/op BenchmarkEquals/10_string_attributes 90780 194.2 ns/op BenchmarkEquals/10_string_attributes 91058 144.6 ns/op BenchmarkEquals/1_int_attribute 624625 18.72 ns/op BenchmarkEquals/1_int_attribute 689478 16.03 ns/op BenchmarkEquals/1_int_attribute 719173 15.68 ns/op BenchmarkEquals/1_int_attribute 707005 16.18 ns/op BenchmarkEquals/1_int_attribute 752048 15.94 ns/op BenchmarkEquals/1_int_attribute 752034 16.23 ns/op BenchmarkEquals/10_int_attributes 90302 132.5 ns/op BenchmarkEquals/10_int_attributes 89929 131.9 ns/op BenchmarkEquals/10_int_attributes 86578 135.2 ns/op BenchmarkEquals/10_int_attributes 90482 133.1 ns/op BenchmarkEquals/10_int_attributes 90255 132.0 ns/op BenchmarkEquals/10_int_attributes 87615 134.6 ns/op PASS ok go.opentelemetry.io/otel/attribute 0.578s PASS ok go.opentelemetry.io/otel/attribute/internal 0.017s ``` --------- Co-authored-by: Flc゛ <four_leaf_clover@foxmail.com> Co-authored-by: Robert Pająk <pellared@hotmail.com>
a69ecb9 to
431c5ed
Compare
Forked from #7175 (comment) This adds a test for JSON marshaling of attribute sets. --------- Co-authored-by: Flc゛ <four_leaf_clover@foxmail.com> Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
a3fb9c7 to
ebbb889
Compare
|
There is a significant performance cost (the cost of set.Equals from #7262) to check for duplicates, but it is still a huge improvement over the current state. I'm trying to add a unit test with a collision, but it is taking a while for the fuzzer to generate a collision for me... Even with a cardinality of 100 million attributes, I didn't hit one :). I updated the benchmark results in the description. |
Julius already cited this, but I too want to note there were issues: Prometheus has generally moved from
|
Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
Co-authored-by: Robert Pająk <pellared@hotmail.com>
70744f0 to
3add646
Compare
c8d11bd to
4ed4e0d
Compare
|
I plan to investigate xxHash in a separate issue. From my brief research, it is primarily designed for efficient hashing of large amounts of input data (e.g. checksums for files). It should still be faster than fnv64-1a for our inputs, but the difference may or may not be significant. I've seen conflicting statements for which algorithm has a better distribution of hash values, although most seem to favor xxHash as well. I don't think that should block this PR considering the substantial performance benefits it brings. |
## Overview ### Added - Greatly reduce the cost of recording metrics in `go.opentelemetry.io/otel/sdk/metric` using hashing for map keys. (#7175) - Add `WithInstrumentationAttributeSet` option to `go.opentelemetry.io/otel/log`, `go.opentelemetry.io/otel/metric`, and `go.opentelemetry.io/otel/trace` packages. This provides a concurrent-safe and performant alternative to `WithInstrumentationAttributes` by accepting a pre-constructed `attribute.Set`. (#7287) - Add experimental observability for the Prometheus exporter in `go.opentelemetry.io/otel/exporters/prometheus`. Check the `go.opentelemetry.io/otel/exporters/prometheus/internal/x` package documentation for more information. (#7345) - Add experimental observability metrics in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc`. (#7353) - Add temporality selector functions `DeltaTemporalitySelector`, `CumulativeTemporalitySelector`, `LowMemoryTemporalitySelector` to `go.opentelemetry.io/otel/sdk/metric`. (#7434) - Add experimental observability metrics for simple log processor in `go.opentelemetry.io/otel/sdk/log`. (#7548) - Add experimental observability metrics in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc`. (#7459) - Add experimental observability metrics in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp`. (#7486) - Add experimental observability metrics for simple span processor in `go.opentelemetry.io/otel/sdk/trace`. (#7374) - Add experimental observability metrics in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#7512) - Add experimental observability metrics for manual reader in `go.opentelemetry.io/otel/sdk/metric`. (#7524) - Add experimental observability metrics for periodic reader in `go.opentelemetry.io/otel/sdk/metric`. (#7571) - Support `OTEL_EXPORTER_OTLP_LOGS_INSECURE` and `OTEL_EXPORTER_OTLP_INSECURE` environmental variables in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#7608) - Add `Enabled` method to the `Processor` interface in `go.opentelemetry.io/otel/sdk/log`. All `Processor` implementations now include an `Enabled` method. (#7639) - The `go.opentelemetry.io/otel/semconv/v1.38.0` package. The package contains semantic conventions from the `v1.38.0` version of the OpenTelemetry Semantic Conventions. See the [migration documentation](./semconv/v1.38.0/MIGRATION.md) for information on how to upgrade from `go.opentelemetry.io/otel/semconv/v1.37.0.`(#7648) ### Changed - `Distinct` in `go.opentelemetry.io/otel/attribute` is no longer guaranteed to uniquely identify an attribute set. Collisions between `Distinct` values for different Sets are possible with extremely high cardinality (billions of series per instrument), but are highly unlikely. (#7175) - `WithInstrumentationAttributes` in `go.opentelemetry.io/otel/trace` synchronously de-duplicates the passed attributes instead of delegating it to the returned `TracerOption`. (#7266) - `WithInstrumentationAttributes` in `go.opentelemetry.io/otel/meter` synchronously de-duplicates the passed attributes instead of delegating it to the returned `MeterOption`. (#7266) - `WithInstrumentationAttributes` in `go.opentelemetry.io/otel/log` synchronously de-duplicates the passed attributes instead of delegating it to the returned `LoggerOption`. (#7266) - Rename the `OTEL_GO_X_SELF_OBSERVABILITY` environment variable to `OTEL_GO_X_OBSERVABILITY` in `go.opentelemetry.io/otel/sdk/trace`, `go.opentelemetry.io/otel/sdk/log`, and `go.opentelemetry.io/otel/exporters/stdout/stdouttrace`. (#7302) - Improve performance of histogram `Record` in `go.opentelemetry.io/otel/sdk/metric` when min and max are disabled using `NoMinMax`. (#7306) - Improve error handling for dropped data during translation by using `prometheus.NewInvalidMetric` in `go.opentelemetry.io/otel/exporters/prometheus`.⚠️ **Breaking Change:** Previously, these cases were only logged and scrapes succeeded. Now, when translation would drop data (e.g., invalid label/value), the exporter emits a `NewInvalidMetric`, and Prometheus scrapes **fail with HTTP 500** by default. To preserve the prior behavior (scrapes succeed while errors are logged), configure your Prometheus HTTP handler with: `promhttp.HandlerOpts{ ErrorHandling: promhttp.ContinueOnError }`. (#7363) - Replace fnv hash with xxhash in `go.opentelemetry.io/otel/attribute` for better performance. (#7371) - The default `TranslationStrategy` in `go.opentelemetry.io/exporters/prometheus` is changed from `otlptranslator.NoUTF8EscapingWithSuffixes` to `otlptranslator.UnderscoreEscapingWithSuffixes`. (#7421) - Improve performance of concurrent measurements in `go.opentelemetry.io/otel/sdk/metric`. (#7427) - Include W3C TraceFlags (bits 0–7) in the OTLP `Span.Flags` field in `go.opentelemetry.io/exporters/otlp/otlptrace/otlptracehttp` and `go.opentelemetry.io/exporters/otlp/otlptrace/otlptracegrpc`. (#7438) - The `ErrorType` function in `go.opentelemetry.io/otel/semconv/v1.37.0` now handles custom error types. If an error implements an `ErrorType() string` method, the return value of that method will be used as the error type. (#7442) ### Fixed - Fix `WithInstrumentationAttributes` options in `go.opentelemetry.io/otel/trace`, `go.opentelemetry.io/otel/metric`, and `go.opentelemetry.io/otel/log` to properly merge attributes when passed multiple times instead of replacing them. Attributes with duplicate keys will use the last value passed. (#7300) - The equality of `attribute.Set` when using the `Equal` method is not affected by the user overriding the empty set pointed to by `attribute.EmptySet` in `go.opentelemetry.io/otel/attribute`. (#7357) - Return partial OTLP export errors to the caller in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc`. (#7372) - Return partial OTLP export errors to the caller in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#7372) - Return partial OTLP export errors to the caller in `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc`. (#7372) - Return partial OTLP export errors to the caller in `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp`. (#7372) - Return partial OTLP export errors to the caller in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc`. (#7372) - Return partial OTLP export errors to the caller in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp`. (#7372) - Fix `AddAttributes`, `SetAttributes`, `SetBody` on `Record` in `go.opentelemetry.io/otel/sdk/log` to not mutate input. (#7403) - Do not double record measurements of `RecordSet` methods in `go.opentelemetry.io/otel/semconv/v1.37.0`. (#7655) - Do not double record measurements of `RecordSet` methods in `go.opentelemetry.io/otel/semconv/v1.36.0`. (#7656) ### Removed - Drop support for [Go 1.23]. (#7274) - Remove the `FilterProcessor` interface in `go.opentelemetry.io/otel/sdk/log`. The `Enabled` method has been added to the `Processor` interface instead. All `Processor` implementations must now implement the `Enabled` method. Custom processors that do not filter records can implement `Enabled` to return `true`. (#7639)
Re-opening #5028 with new benchmarks. For cases with 10 attributes, this reduces the overhead of metric measurements by ~80-90% (depending on lock contention). It introduces a small probability of collision for attribute sets in the metrics SDK. For an instrument with 1 million different attribute sets, the probability of a collision is approximately 2 * 10^-8. For a more "normal" cardinality of 1000 on an instrument, it is approximately 2 * 10^-17.
Parallel benchmarks:
Single-threaded benchmarks: