Skip to content

Used BatchSpansProcessor instead of SimpleSpansProcessor#393

Merged
trask merged 12 commits into
open-telemetry:masterfrom
RashmiRam:master
May 16, 2020
Merged

Used BatchSpansProcessor instead of SimpleSpansProcessor#393
trask merged 12 commits into
open-telemetry:masterfrom
RashmiRam:master

Conversation

@RashmiRam

Copy link
Copy Markdown
Contributor

BatchSpansProcessor won't get blocked by the exporter.
Fixes #368

@trask This is the quick fix. Are we working on introducing the disruptor in the future? If that's the case, Can this just be a stop gap for now?

BatchSpansProcessor won't get blocked by the exporter.

Fixes open-telemetry#368

Signed-off-by: RashmiRam <rashmi.ramanathan@freshworks.com>
@RashmiRam

Copy link
Copy Markdown
Contributor Author

I signed it

@johnbley

johnbley commented May 8, 2020

Copy link
Copy Markdown
Member

Hi Rashmi -
Thanks for the patch! It looks like this is causing the smoke-tests to fail (you can click through the "details" to see the build logs) . You can reproduce this locally by running (e.g.) ./gradlew :smoke-tests:springboot:test. In SpringBootSmokeTest.groovy we use a logging exporter and a simple log reader to check that the spans get logged. Perhaps there's a clean way to explicitly forceFlush the batch or to special-case the logging processor to not be batched?

@RashmiRam

Copy link
Copy Markdown
Contributor Author

I was trying to run failure test alone as entire instrumentation test takes hours to run. I'll check that. Thanks for the help @johnbley

@RashmiRam

Copy link
Copy Markdown
Contributor Author

I fixed the springboot smoke tests. Now, I see random failures in instrumentation tests(cassandra, akka-hhtp) which are passing in my local. Any help would be much appreciated.

@iNikem

iNikem commented May 8, 2020

Copy link
Copy Markdown
Contributor

Yeah, our tests are quite flaky right now. We are working on it :) @trask can you please rerun the pipeline?

@iNikem

iNikem commented May 9, 2020

Copy link
Copy Markdown
Contributor

@RashmiRam I see that check job has failed in CI with code formatting violation. Can you please run ./gradlew googleJavaFormat as suggested by CONTRIBUTING.md?

@RashmiRam

RashmiRam commented May 9, 2020

Copy link
Copy Markdown
Contributor Author

I did @iNikem and ran ./gradlew auto-tooling:verifyGoogleJavaFormat to find any pending violations. It is passing. I am not sure why the tests in CI are failing

@iNikem

iNikem commented May 9, 2020

Copy link
Copy Markdown
Contributor

I have checked out your repo and tried to run ./gradlew auto-tooling:verifyGoogleJavaFormat:

The following files are not formatted properly:

/Users/nikem/Documents/workspace/mtp/opentelemetry-auto-instr-java/agent-tooling/src/main/java/io/opentelemetry/auto/tooling/TracerInstaller.java

Did you commit your changes?

@RashmiRam

Copy link
Copy Markdown
Contributor Author

Yes. I did This is the commit. c4383ba

@johnbley

Copy link
Copy Markdown
Member

Yes. I did This is the commit. c4383ba

I've seen googleJavaFormat's cache get confused at times; try adding some whitespace, saving, and then re-running it.

@RashmiRam

Copy link
Copy Markdown
Contributor Author

That helped @johnbley check job is passing now in CircleCI. Thanks a ton.


// // Setting configuration variables of batch span processor through env vars
// // This config is to immediately flush a batch of 1 span with delay of 10ms
processBuilder.environment().put("OTEL_BSP_MAX_EXPORT_BATCH", "1")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this solution a lot. Much cleaner than anything I had suggested.

@trask

trask commented May 11, 2020

Copy link
Copy Markdown
Member

@RashmiRam

Copy link
Copy Markdown
Contributor Author

@trask I have updated the timeout in both play & springbok smoke tests to 20s as 15s fails at times even in my local.

@trask trask left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @RashmiRam!

Comment thread smoke-tests/play/src/test/groovy/io/opentelemetry/smoketest/PlaySmokeTest.groovy Outdated
@trask

trask commented May 16, 2020

Copy link
Copy Markdown
Member

Hey @RashmiRam, sorry we haven't merged this yet. The play smoke test is still failing in CI even after I bumped the timeout to 30 seconds. I'm looking into it now.

@trask

trask commented May 16, 2020

Copy link
Copy Markdown
Member

The problem is that our smoke tests use the logging exporter and parse the log file which is not reliable under concurrent logging, which can happen now with the background BatchSpanProcessor thread.

Here's an example of the problem, where LOGGED_SPAN didn't end up on it's own line:

[opentelemetry.auto.trace 2020-05-14 22:26:38:428 +0000] [application-akka.actor.default-dispatcher-4] DEBUG io.opentelemetry.auto.bootstrap.instrumentation.java.concurrent.State - Failed to set parent span because another parent span is already set io.opentelemetry.auto.bootstrap.instrumentation.java.concurrent.State@448e2484: new: io.opentelemetry.sdk.trace.RecordEventsReadableSpan@3dc393cf, old: io.opentelemetry.sdk.trace.RecordEventsReadableSpan@3dc393cfLOGGED_SPAN GET /welcome 8c4cd5a61ad2794e

The real solution here is to improve our test harness, I made a note here: #298 (comment)

For now though, the debug logging Failed to set parent span that is causing the interference is unnecessarily verbose, so I fixed it. I had fixed it in #318, but I had reverted my change in #362 because of DataDog/dd-trace-java#1403 (review). I think re-applying #318 is good anyways, and it resolves this issue for now, until we improve the test harness.

@trask trask merged commit 29a18bd into open-telemetry:master May 16, 2020
@RashmiRam

RashmiRam commented May 17, 2020

Copy link
Copy Markdown
Contributor Author

Thank you so much for the detailed explanation on the test failures @trask and Thanks for the merge too. It would be great if you let me know about the release date for this too.

@trask

trask commented May 17, 2020

Copy link
Copy Markdown
Member

Releasing v0.3.0 tomorrow (Mon, May 18) 👍

schmikei pushed a commit to schmikei/opentelemetry-java-instrumentation that referenced this pull request Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Don't use synchronous span processor

4 participants