Skip to content

Add span_min_duration flag#1150

Merged
felixbarny merged 15 commits intoelastic:masterfrom
felixbarny:span-duration-threshold
May 6, 2020
Merged

Add span_min_duration flag#1150
felixbarny merged 15 commits intoelastic:masterfrom
felixbarny:span-duration-threshold

Conversation

@felixbarny
Copy link
Copy Markdown
Member

@felixbarny felixbarny commented Apr 21, 2020

closes #1094

@felixbarny felixbarny requested a review from eyalkoren April 21, 2020 12:48
Copy link
Copy Markdown
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very useful feature.
However, a risky one- may yield broken traces.
I spent some time looking into scenarios of non sampled transactions together with this and didn't find where it would break, but maybe add tests.
Specifically, the context propagation spans decide what to set as parent ID based on the sampling state, and they also modify the discard state. We DO have tests for those and it seems to be fine, so no real advice here other than think on related scenarios to test :)

@felixbarny
Copy link
Copy Markdown
Member Author

However, a risky one- may yield broken traces.

Yeah, I have to agree. For example, if someone uses the API to propagate the context to another thread, we might discard the span even though it's a parent of a longer async operation that starts after the initial span has already finished.

- More testing
- Fixes for interactions of transaction_max_spans and span_min_duration
- Revert removing noop
- TraceContext can't be de/activated anymore, only AbstractSpans
  - Removes lots of baggage in the internal API
  - This was only used to reduce (but not avoid) allocations when using the OpenTracing API bridge
  - Historically, activating a TraceContext was also needed when activating an ended span.
    Nowadays, we use reference counting to determine whether it's safe to recycle a span rather than recycling after it's ended and serialized.
- Avoids context propagation directly via TraceContext which would not mark the span as non-discardable
- Makes Spans used in OT API bridge non-recyclable
  - Previously, TraceContext has been used for finished spans
  - Now, TraceContext can't be activated anymore,
    thus spans are GC'd rather than recycled so that OT spans can be activated after they have been finished
@codecov-io
Copy link
Copy Markdown

codecov-io commented Apr 23, 2020

Codecov Report

Merging #1150 into master will increase coverage by 0.31%.
The diff coverage is 48.77%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1150      +/-   ##
============================================
+ Coverage     59.37%   59.69%   +0.31%     
+ Complexity     2862       83    -2779     
============================================
  Files           331      328       -3     
  Lines         15017    14996      -21     
  Branches       2087     2086       -1     
============================================
+ Hits           8917     8952      +35     
+ Misses         5489     5433      -56     
  Partials        611      611              
Impacted Files Coverage Δ Complexity Δ
...astic/apm/agent/bci/ElasticApmInstrumentation.java 58.82% <0.00%> (ø) 0.00 <0.00> (-9.00)
...bci/methodmatching/TraceMethodInstrumentation.java 58.18% <0.00%> (+1.03%) 0.00 <0.00> (-9.00) ⬆️
...src/main/java/co/elastic/apm/agent/impl/Scope.java 0.00% <ø> (ø) 0.00 <0.00> (ø)
...lastic/apm/agent/objectpool/ObjectPoolFactory.java 100.00% <ø> (+12.50%) 0.00 <0.00> (-9.00) ⬆️
...tpclient/ApacheHttpAsyncClientInstrumentation.java 38.88% <0.00%> (ø) 0.00 <0.00> (-6.00)
...nt/httpclient/ApacheHttpClientInstrumentation.java 41.17% <0.00%> (ø) 0.00 <0.00> (-6.00)
...pclient/LegacyApacheHttpClientInstrumentation.java 33.33% <0.00%> (ø) 0.00 <0.00> (-5.00)
...client/helper/HttpAsyncRequestProducerWrapper.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
.../agent/plugin/api/AbstractSpanInstrumentation.java 50.58% <0.00%> (+2.27%) 0.00 <0.00> (-3.00) ⬆️
.../apm/agent/plugin/api/ApiScopeInstrumentation.java 60.00% <ø> (ø) 0.00 <0.00> (-3.00)
... and 59 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f49486c...b9d6f2d. Read the comment docs.

Copy link
Copy Markdown
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if you add a very useful feature, with lots of tests, and still remove more code than you added - you must have done something REALLY well 😄
Awesome refactoring 👏

@ghost
Copy link
Copy Markdown

ghost commented Apr 28, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview stats

Expand to view the summary

Build stats

Test stats 🧪

Test Results
Failed 0
Passed 1241
Skipped 12
Total 1253

@felixbarny
Copy link
Copy Markdown
Member Author

felixbarny commented Apr 28, 2020

@v1v I got a message regarding a build failure even though the tests are still running 🤔

Edit: Nevermind, that was because I added another commit that canceled the previous build.

@felixbarny
Copy link
Copy Markdown
Member Author

@eyalkoren raised some valid concerns:

I can see why a user may want to see ALL important events (like HTTP or DB), regardless of duration, but still cast a wide net on some package through trace_methods and remove fast spans. Similarly with inferred spans.
What do you say we use span_min_duration for all thresholds if set, but if the specific ones are set- give them precedence? If we are worried about too-many-config-options, we can make them internal . It’s just that we added the trace_methos threshold to address a pain, and now we will force people to accept losing meaningful events in order to use that.

In order to move forward with this, let's postpone whether or not to deprecate or replace or align the naming for the other configuration options.
Thus, I'll remove the deprecated tag and restore the profiling_inferred_spans_min_duration flag.

Copy link
Copy Markdown
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Several notes:

  • Effectively, as it is now, when multiple threshold configs are set, the higher would determine which would be discarded. I think this makes sense. I updated the config doc accordingly.
  • We have AsyncTraceMethodInstrumentationTest as well, that should be used to test async scenarios as well
  • We need tests for the combinations of threshold settings

@felixbarny felixbarny merged commit ec2b16c into elastic:master May 6, 2020
@felixbarny felixbarny deleted the span-duration-threshold branch May 6, 2020 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Need DB spans duration threshold, similar to trace_methods_duration_threshold

3 participants