-
Notifications
You must be signed in to change notification settings - Fork 973
Description
Summary
The Sampling SIG has been working on a proposal to follow the W3C tracecontext group, which has added a flag to convey definite information about randomness in the TraceID.
In particular, we aim to address the TODO about the TraceIDRatioBased Sampler:
TODO: Add details about how the TraceIdRatioBased is implemented as a function of the TraceID. #1413
We are looking for community input on a choice which will impact implementation complexity for OpenTelemetry Samplers as well as human-interpretability of the raw data.
Note this proposal was co-authored by @kentquirk @oertl @PeterF778 and @jmacd.
ACTION ITEM: Please review and vote for your preferred encoding strategy in the comments below. OpenTelemetry Tracing SDK authors as well as OpenTelemetry Collector trace processors will be asked to implement the encoding and decoding strategies here in order to communicate about sampling selectivity, and we need your input!
Background
We propose to add information to the W3C Trace Context specification to allow consistent sampling decisions to be made across the entire lifetime of a trace.
The expectation is that trace IDs should contain at least 56 bits of randomness in a known portion of the ID. This value is known as r, and there is a bit in the trace header that indicates its presence.
In probabilistic sampling, the sampling decision is a binary choice to keep (store) or drop (discard) a trace. Because traces are composed of multiple spans, we want to be sure that the same decision is made for all elements in the trace. Therefore, we don’t make a truly random decision at each stage. We instead wish to use the randomness embedded in the trace ID so that all stages can make consistent decisions.
In order to make consistent decisions, we need to propagate not only the randomness (the r value), but also the sampling selectivity used. In other words, in a trace that travels between services A, B, and C, any decision made by B should use the same information as a decision made by A, and B could potentially modify the selectivity so that C could also make an appropriate decision.
As noted, the r value expresses a 56-bit random value that can be used as the source of randomness for a probabilistic sampling decision. The intent of this proposal is to express the sampling selectivity that was used to make the decision, and to do it in the trace state.
Sampling selectivity can be described in several different ways:
- As a probability – a value between 0 (100% chance of being dropped) and 1 (100% chance of being kept).
- As a sampling rate – an indication of the ratio between the total number of items and the number of items kept. It is typically a positive integer. It is the inverse of the probability – a sampling rate of 8 is equivalent to a sampling probability of 0.125.
- As a threshold – if r (56 bits of randomness) is treated as a 56-bit integer between 0 and (2^56)-1, then the threshold is the limit below which the sample will be kept. For example, the sampling rate of 8 is equivalent to a threshold value of 2^53, or 0x20000000000000.
Minimum requirements
Given the sampling information on the trace state it MUST be specified for any possible representations on any platform, how this translates to the applied sampling threshold (the value that was used to compare against the random bits). Only this allows to reproduce the sampling decision together with the 56 random bits and gives 100% consistency.
Based on that, it can be derived which of the 2^56+1 sampling thresholds, that are meaningful when having 56 random bits, can be expressed by the sampling information on the trace state. The proposals should therefore be clear about which thresholds are actually supported. The set of supported thresholds also defines the set of possible sampling probabilities. The sampling probability is just the threshold multiplied by 2^(-56).
When picking one of the supported thresholds, there should be a lossless way to map it to the sampling information that is written to the trace state. Lossless in the sense, that the reverse mapping as described in 1. yields again exactly the chosen threshold. The mapping from thresholds to the sampling information is important for adaptive sampling, where the threshold is automatically chosen.
Objective
We would like to express this sampling probability/rate/threshold in a reasonably compact way in the trace state. We would like that expression to be easy and efficient to generate and parse in any programming language. Another requirement is that the used notation should be able to describe cases of non-probabilistic sampling (corresponding to the zero adjusted count or the old p=63 case). We have been referring to this value as t.
The sampling SIG has been discussing this issue for some time, and we have examined several proposals. Each proposal has its strengths and weaknesses and we are looking for input from the larger community.
Note that this document includes just a summary of the proposals, below; all of them have been specified in sufficient detail to resolve most implementation issues. We are hoping for input from the community to help make the big decision about the implementation direction.
Request for input
The major difference in these proposals that we wish to seek input on is whether it is more important to optimize for threshold calculation (option A) at the expense of human readability, or whether to choose one of the other options which are readable and accessible, but make threshold calculations harder to work with.
List of options
When we refer to Tmax, we mean 2^56 (0x100000000000000 or 72057594037927936)
Option A: Hex Threshold
| Title | A: Hex Threshold |
|---|---|
| Description | The t value is expressed as a threshold value in the range [0, 2^56-1] using 14 hexadecimal digits; an absent t-value represents a threshold of 2^56 corresponding to 100% sampling probability |
| Examples | Keep all: t is omitted Keep 1 in 10: t=19999999999999 Keep 1 in 8: t=20000000000000 Keep half: t=80000000000000 Keep 2/3: t=aaaaaaaaaaaaaa Keep 1 in 1000: t=004189374bc6a7 |
| Mapping to the sampling threshold | threshold = parseHexInt(t) If t is absent the threshold is 2^56 |
| Supported thresholds | All 2^56 + 1 thresholds that are meaningful when using 56 random bits. Possible thresholds are {0, 1, 2, 3, …., 2^56-1, 2^56}. |
| Mapping a supported sampling threshold to t | t = encodeHexInt(threshold) If t is 2^56, corresponding to 100% sampling probability, the t-value is not set |
| Advantages | Both t and r can be parsed as hex strings using the same parsing function and compared directly with no further processing. Simplest approach satisfying the minimum requirements above. t-value can be compared by humans to trace-ID to understand the sampling decision. No floating-point operations or complex parsing needed. |
| Disadvantages | t-value cannot be directly read by humans as sampling probability. |
Option A1: Hex Threshold with omission of trailing zeros
| Title | A1: Hex Threshold (omission of trailing zeros allowed) |
|---|---|
| Description | The t value is expressed as a threshold value in the range [0, 2^56-1] using 14 hexadecimal digits; an absent t-value represents a threshold of 2^56 corresponding to 100% sampling probability, trailing 0s may be omitted for brevity. |
| Examples | Keep all: t is omitted Keep 1 in 10: t=19999999999999 Keep 1 in 8: t=2 Keep half: t=8 Keep 2/3: t=aaaaaaaaaaaaaa Keep 1 in 1000: t= 004189374bc6a7 |
| Mapping to the sampling threshold | threshold = parseHexInt(t) If t is absent the threshold is 2^56t is padded with zeros if it has less than 14 hex digits |
| Supported thresholds | All 2^56 + 1 thresholds that are meaningful when using 56 random bits |
| Mapping a supported sampling threshold to t | t = encodeHexInt(threshold) If t is 2^56, corresponding to 100% sampling probability, the t-value is not settrailing zeros may be omitted |
| Advantages | Both t and r can be parsed as hex strings and compared directly with no further processing. Simplest approach satisfying the minimum requirements above. t-value can be compared by humans to trace-ID to understand the sampling decision. No floating-point operations or complex parsing needed.Compact representation for certain probabilities, especially those that are powers of two |
| Disadvantages | t-value cannot be directly read by humans as sampling probability. compact power-of-2 representations are almost misleading. Other common sample rates are not expressed in compact or obvious ways. |
Option B: Integer Sampling Rate
| Title | B: Integer Sampling Rate |
|---|---|
| Description | The t value is expressed as a positive integer representing the ratio between the number of items kept and the total number of items. |
| Examples | Keep all: t=1 Keep 1 in 8: t=8 Keep 1 in 10: t=10 Keep half: t=2 Keep 2/3: not expressible in this format Keep 1 in 1000: t=1000 Keep none: not expressible in this format |
| Mapping to the sampling threshold | threshold = round(2^56 / parseDecimalInt(t)) |
| Supported thresholds | 2^56, 2^56/2, 2^56/3, 2^56/4, …. corresponding to sampling probabilities 1, ½, ⅓, ¼, … |
| Mapping a supported sampling threshold to t | t = encodeDecimalInt(round(2^56 / threshold)), this reverse mapping can be lossless for integers up to 268M |
| Advantages | Easy format for most common rates. Adjusted counts (extrapolation factors) are guaranteed to be integers. |
| Disadvantages | There are many values it can’t express; particularly the desire to keep more than half but less than all of the data. Mappings require floating-point divisions. |
Option C: Sampling probability
| Title | C: Sampling Probability |
|---|---|
| Description | The t value is expressed as a positive decimal floating point value between 0 and 1, representing the probability of keeping a given event. |
| Examples | Keep all: t=1 Keep 1 in 8: t=.125 Keep 1 in 10: t=.1 Keep half: t=.5 Keep 2/3 ieee precision: t=.6666666666667 Keep 2/3 precision 4: t=.6667 Keep 2/3 precision 2: t=.67 Keep 1 in 1000: t=.001 |
| Mapping to the sampling threshold | threshold = 2^56 * parseDecimalFloat(t) Note rounding is performed in parseDecimalFloat. |
| Supported thresholds | If t is a double-precision floating-point number, at least all thresholds with 3 least significant bits being equal to zero. |
| Mapping a supported sampling threshold to t | t = encodeDecimalFloat(threshold * 2^(-56)), this reverse mapping can be lossless for all supported thresholds when using enough decimal digits |
| Advantages | Easy format for humans to read. |
| Disadvantages | Requires floating point math. Doesn’t express powers of 2 compactly. |
Option C1: Sampling probability with hex floating point
| Title | C1: Sampling Probability with hex floating point |
|---|---|
| Description | The t value is expressed as a positive decimal or hexadecimal floating point value between 0 and 1, representing the probability of keeping a given event. The encoding MAY use C99 and IEEE-754-2008 specified hex floating point as an exact representation.Note about precision: IEEE-754 double-wide floating point numbers carry 52 bits of significand, 4 bits fewer than trace randomness. Below, “ieee precision” refers to 53 bits of precision. |
| Examples | Keep all: as in C Keep 1 in 8: as in C or t=0x1p-3 Keep 1 in 10: as in C or 0x1.5p-3 Keep half: as in C or t=0x1p-1 Keep 2/3 ieee precision: as in C or 0x1.5555555555555p-1 Keep 2/3 precision 4: as in C or 0x1.5555p-1 Keep 2/3 precision 2: as in C or 0x1.55p-1 Keep 1 in 1000 ieee precision: as in C or 0x1.0624dd2f1a9fcp-10 Keep 1 in 1000 precision 4: as in C or 0x1.0625p-10 Keep 1 in 1000 precision 2: as in C or 0x1.06p-10 |
| Mapping to the sampling threshold | threshold = 2^56 * parseHexFloat(t) Note there is no rounding performed. |
| Supported thresholds | If t is a double-precision floating-point number, at least all thresholds with 3 least significant bits being equal to zero. |
| Mapping a supported sampling threshold to t | t = encodeHexFloatFromThreshold(threshold), this method is exact and lossless using built-in libraries up to 52 bits of precision, and exact and lossless at 56 bits of precision using custom code. |
| Advantages | Decimal format for humans to read (as in C), but hex format permits exact and lossless encoding of arbitrary thresholds up to 56 bits |
| Disadvantages | Users require custom code, or floating point math and a library that supports hex representation of it. Since this was added in ISO-C99 (1999) and IEEE-754 (2008), it is relatively widespread. |
Option C2: Sampling Probability with unnormalized hex floating point
| Title | C2: Sampling Probability with unnormalized hex floating point |
|---|---|
| Description | In addition to C and C1, encoders are encouraged to use unnormalized hex floating point when they synthesize arbitrary probabilities, because it is lossless and can be easily read as a sampling threshold by humans. |
| Examples | Keep all: as in C1 Keep 1 in 8: as in C1 or t=0x2p-04 (threshold = 0x20000000000000) Keep 1 in 10: as in C1 or t=0x2ap-8 (threshold = 0x2a000000000000) Keep half: as in C1 or t=0x8p-04 (threshold 0x80000000000000) Keep 2/3 full precision: as in C1 or 0xaaaaaaaaaaaaaap-56 Keep 2/3 ieee precision: as in C1 or 0xaaaaaaaaaaaaap-52 Keep 2/3 precision 4: as in C1 or 0xaaabp-16 Keep 2/3 precision 2: as in C1 or 0xabp-8 Keep 1 in 1000 ieee precision: as in C1 or 0x4189374bc6a7p-56 Keep 1 in 1000 precision 4: as in C1 or 0x4189p-16 Keep 1 in 1000 precision 2: as in C1 or 0x42p-8 |
| Mapping to the sampling threshold | threshold = 2^56 * parseHexFloat(t) Note there is no rounding performed. |
| Supported thresholds | If t is a double-precision floating-point number, at least all thresholds with 3 least significant bits being equal to zero. |
| Mapping a supported sampling threshold to t | t = encodeHexFloatFromThreshold(threshold), this method is exact and lossless using built-in libraries up to 52 bits of precision, and exact and lossless at 56 bits of precision using custom code. |
| Advantages | Decimal format for humans to read (as in C), but hex format permits exact and lossless encoding of arbitrary thresholds up to 56 bits |
| Disadvantages | Users require custom code, or floating point math and a library that supports hex representation of it. Since this was added in ISO-C99 (1999) and IEEE-754 (2008), it is relatively widespread. |
Option D: Combination of C and D
| Title | D: Combine B and C21 |
|---|---|
| Description | The t value can be either a value <1, in which case it’s interpreted as in C (or C2 preferably or C1) above. Or it can be >=1, in which case it’s interpreted as in B. Implementations are recommended to limit precision to keep the encoding of sampling probabilities compact. |
| Examples | Keep all: t=1 Keep 1 in 8: t=8 or t=.125 Keep 1 in 10: t=.1 or t=10 Keep half: t=.5 or t=2 Keep 2/3: t=.6667 Keep 1 in 1000: t=.001 or t=1000 Keep arbitrary hex-digit threshold HHHH with custom code: t=0xHHHHp-(len(HHHH)*4) Keep arbitrary hex-digit threshold HHHH with standard library:t=0x1.JJJJJp-DDwhere JJJJJ and DD correspond with the normalized hex floating point value value corresponding with HHHH. Note that JJJJJ is one digit longer than HHHH, due to shifting hex digits by one bit. |
| Mapping to the sampling threshold | See B and C |
| Supported thresholds | The union of B and C. Hence, if t is a double-precision floating-point number, at least all thresholds with 3 least significant bits being equal to zero. |
| Mapping a supported sampling threshold to t | Dependent on whether the threshold is supported by B or C, one has to choose the reverse mapping of B or C, respectively. |
| Advantages | The most convenient representation can be used. This allows constant user-input probabilities to be represented exactly in their original human-readable form, while it allows machine-generated probabilities to be rounded and losslessly encoded with variable precision. |
| Disadvantages | Requires floating point math or custom code. Parsing is slightly more complex. |
Option E: Ratio
| Title | E: Ratio |
|---|---|
| Description | The t value is expressed as a ratio of two integers n and d, separated by a slash. If n is 1 it may be omitted. |
| Examples | Keep all: t=1 Keep 1 in 8: t=8 or t=1/8 Keep 1 in 10: t=10 or t=1/10 Keep half: t=50/100 or t=1/2 or t=2 Keep 2/3: t=2/3 or t=6667/10000 Keep 1 in 1000: t=1/1000 or t=1000 |
| Mapping to the sampling threshold | threshold = parseInt(t_numerator) * 2^56 / parseInt(t_denominator)(this formula may overflow when using 64-bit integers) |
| Supported thresholds | The set of supported thresholds cannot be easily described and depends on the value range of the numerator and the denominator. |
| Mapping a supported sampling threshold to t | Not unique and difficult as it relates to finding the closest ratio to the value given by threshold * 2^(-56) |
| Advantages | Does not require floating point or complex parsing but can express a full range of sample rates. |
| Disadvantages | Converting from a probability to a ratio may lose precision. |
Option F: Powers-of-two
| Title | F:Powers of 2 |
|---|---|
| Description | The t value is expressed as the exponent of the sampling probability given by 0.5^t, see current proposal https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/#consistent-probability-sampling |
| Examples | Keep all: t=0 Keep 1 in 8: t=3 Keep half: t=1 |
| Mapping to the sampling threshold | threshold = 2^(56 - parseInt(t)) |
| Supported thresholds | 2^56, 2^55, 2^54,…., 1 corresponding to sampling probabilities 1,½, ¼,...,2^(-56).. |
| Mapping a supported sampling threshold to t | t = encodeInt(numberOfLeadingZeros(threshold)-7), provided that the threshold is a 64-bit integer numberOfLeadingZeros requires a single CPU instruction |
| Advantages | Simple and compact. Adjusted counts (extrapolation factors) are guaranteed to be integers. Fast mapping to thresholds and vice versa. |
| Disadvantages | Only supports power of 2 sampling probabilities. |