Skip to content

Interrupt exporter retry backoff sleeps when shutdown is called. Update BatchSpan/LogRecordProcessor.shutdown to complete in 30 seconds#4638

Merged
xrmx merged 51 commits intoopen-telemetry:mainfrom
DylanRussell:shutdown_refactor
Jul 23, 2025
Merged

Interrupt exporter retry backoff sleeps when shutdown is called. Update BatchSpan/LogRecordProcessor.shutdown to complete in 30 seconds#4638
xrmx merged 51 commits intoopen-telemetry:mainfrom
DylanRussell:shutdown_refactor

Conversation

@DylanRussell
Copy link
Contributor

@DylanRussell DylanRussell commented Jun 16, 2025

Description

This PR updates the HTTP/gRPC OTLP Exporters, so that the retry backoff sleep in export gets immediately interrupted (and failure returned) when shutdown is called, instead of sleeping / retrying to completion.

This PR also updates the BatchSpan/LogRecordProcessor's shutdown method to complete after 30 seconds instead of continuing to run until all telemetry was flushed from the queue.

Fixes an issue where shutdown would hang or stall indefinitely, especially when export was failing inside a retry loop.

Fixes: #3309, #4043, #2663

Type of change

  • [ x] Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

  • Lots of unit tests.

Does This PR Require a Contrib Repo Change?

  • Yes. - Link to PR:
  • [ x] No.

Checklist:

  • [x ] Followed the style guidelines of this project
  • [ x] Changelogs have been updated
  • [ x] Unit tests have been added
  • [ x] Documentation has been updated

@DylanRussell
Copy link
Contributor Author

Ok. Removed 4623 from the list of fixed

Copy link
Member

@aabmass aabmass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure on the Exporter.shutdown comment, I'll think on it a little

Copy link
Member

@aabmass aabmass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One bug but otherwise LGTM

DylanRussell and others added 5 commits July 18, 2025 19:47
Co-authored-by: Emídio Neto <9735060+emdneto@users.noreply.github.com>
Co-authored-by: Emídio Neto <9735060+emdneto@users.noreply.github.com>
Copy link
Member

@emdneto emdneto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline to rename the shutdown event variable to _shutdown_in_progress.

@aabmass
Copy link
Member

aabmass commented Jul 22, 2025

discussed offline to rename the shutdown event variable to _shutdown_in_progress.

@DylanRussell is this done? Please update branch I can merge

@DylanRussell
Copy link
Contributor Author

Yes it's done. I renamed the variable. I've merged main now, should be good to go

@xrmx xrmx merged commit d4e6068 into open-telemetry:main Jul 23, 2025
550 of 564 checks passed
@github-project-automation github-project-automation bot moved this from Ready for review to Done in Python PR digest Jul 23, 2025
JWinermaSplunk pushed a commit to JWinermaSplunk/opentelemetry-python that referenced this pull request Feb 17, 2026
…te BatchSpan/LogRecordProcessor.shutdown to complete in 30 seconds (open-telemetry#4638)

* Initial commit to add timeout as a parm to export, make retries encompass timeout

* Fix lint issues

* Fix a bunch of failing style/lint/spellcheck checks

* Remove timeout param from the export calls.

* Fix flaky windows test ?

* Respond to review comments..

* Delete exponential backoff code that is now unused

* Add changelog and remove some unused imports..

* fix typo and unit test flaking on windows

* Refactor tests, HTTP exporters a bit

* Remove unneeded test reqs

* Remove gRPC retry config

* Tweak backoff calculation

* Lint and precommit

* Empty commit

* Another empty commit

* Calculate backoff in 1 place instead of 2

* Update changelog

* Update changelog

* Make new _common directory in the http exporter for shared code

* precommit

* Make many changes

* Reorder shutdown stuff

* Fix merging

* Don't join the thread in case we are stuck in an individual export call

* Add tests, changelog entry

* Update time assertions to satisfy windows.. Fix lint issues

* Skip test on windows

* Use threading Event instead of sleep loop.

* Respond to review comments..

* Pass remaining timeout to shutdown

* Run precommit

* Change variable names

* Switch timeout back to timeout_millis

* Update CHANGELOG.md

Co-authored-by: Emídio Neto <9735060+emdneto@users.noreply.github.com>

* Update CHANGELOG.md

Co-authored-by: Emídio Neto <9735060+emdneto@users.noreply.github.com>

* Rename variable

* Fix variable name

---------

Co-authored-by: Emídio Neto <9735060+emdneto@users.noreply.github.com>
JWinermaSplunk pushed a commit to JWinermaSplunk/opentelemetry-python that referenced this pull request Feb 17, 2026
…te BatchSpan/LogRecordProcessor.shutdown to complete in 30 seconds (open-telemetry#4638)

* Initial commit to add timeout as a parm to export, make retries encompass timeout

* Fix lint issues

* Fix a bunch of failing style/lint/spellcheck checks

* Remove timeout param from the export calls.

* Fix flaky windows test ?

* Respond to review comments..

* Delete exponential backoff code that is now unused

* Add changelog and remove some unused imports..

* fix typo and unit test flaking on windows

* Refactor tests, HTTP exporters a bit

* Remove unneeded test reqs

* Remove gRPC retry config

* Tweak backoff calculation

* Lint and precommit

* Empty commit

* Another empty commit

* Calculate backoff in 1 place instead of 2

* Update changelog

* Update changelog

* Make new _common directory in the http exporter for shared code

* precommit

* Make many changes

* Reorder shutdown stuff

* Fix merging

* Don't join the thread in case we are stuck in an individual export call

* Add tests, changelog entry

* Update time assertions to satisfy windows.. Fix lint issues

* Skip test on windows

* Use threading Event instead of sleep loop.

* Respond to review comments..

* Pass remaining timeout to shutdown

* Run precommit

* Change variable names

* Switch timeout back to timeout_millis

* Update CHANGELOG.md

Co-authored-by: Emídio Neto <9735060+emdneto@users.noreply.github.com>

* Update CHANGELOG.md

Co-authored-by: Emídio Neto <9735060+emdneto@users.noreply.github.com>

* Rename variable

* Fix variable name

---------

Co-authored-by: Emídio Neto <9735060+emdneto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Exporters shutdown takes longer then a minute when failing to send metrics/traces

4 participants