Skip to content

RUM-11821: Fix crash in KronosTimeProvider#3014

Merged
aleksandr-gringauz merged 7 commits into
developfrom
aleksandr-gringauz/RUM-11821/fix-kronos-crash
Nov 20, 2025
Merged

RUM-11821: Fix crash in KronosTimeProvider#3014
aleksandr-gringauz merged 7 commits into
developfrom
aleksandr-gringauz/RUM-11821/fix-kronos-crash

Conversation

@aleksandr-gringauz

@aleksandr-gringauz aleksandr-gringauz commented Nov 19, 2025

Copy link
Copy Markdown
Contributor

What does this PR do?

See the ticket for a stacktrace.

I haven't managed to determine how widespread this crash is, I can't find it in the list of crashes on google play console, I only can see it through the link shared by a single customer though google play console.

I haven't managed to reproduce it either.

The crash happens most likely because (it is the most plausible explanation I can come up with):

  1. We call kronosClock.shutdown here.
  2. After that kronosClock.getCurrentTimeMs is called as a result of this call to sdkCore.time.deviceTimeNs.
  3. This ensureServiceIsRunning call throws IllegalStateException here.
  4. We interact with already shutdown kronos clock, because this sdkCore.time.deviceTimeNs can get the old timeProvider, after that kronosClock.shutdown is called on another thread and finally sdkCore.time.deviceTimeNs interacts with this clock using the old 'timeProvider`.

This PR does the following:
Add try/catch in KronosTimeProvider so that it becomes physically impossible to crash if by some reason we interact with an already shutdown kronos clock.

We could make timeProvider volatile. But it will not help with the point 4. sdkCore.time.deviceTimeNs is not "atomic", clock can be shut down at any point during execution of this code, so the best we can do is try/catch.

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Make sure you discussed the feature or bugfix with the maintaining team in an Issue
  • Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

@aleksandr-gringauz aleksandr-gringauz force-pushed the aleksandr-gringauz/RUM-11821/fix-kronos-crash branch from ba267f4 to effd411 Compare November 19, 2025 12:55
@codecov-commenter

codecov-commenter commented Nov 19, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.23%. Comparing base (c633626) to head (7cffdcd).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3014      +/-   ##
===========================================
+ Coverage    71.16%   71.23%   +0.07%     
===========================================
  Files          859      859              
  Lines        31429    31444      +15     
  Branches      5298     5302       +4     
===========================================
+ Hits         22364    22396      +32     
+ Misses        7565     7550      -15     
+ Partials      1500     1498       -2     
Files with missing lines Coverage Δ
...n/com/datadog/android/core/internal/CoreFeature.kt 85.34% <100.00%> (ø)
...g/android/core/internal/time/KronosTimeProvider.kt 100.00% <100.00%> (ø)

... and 42 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aleksandr-gringauz aleksandr-gringauz force-pushed the aleksandr-gringauz/RUM-11821/fix-kronos-crash branch from effd411 to 4148140 Compare November 19, 2025 14:10
@aleksandr-gringauz aleksandr-gringauz marked this pull request as ready for review November 19, 2025 16:44
@aleksandr-gringauz aleksandr-gringauz requested review from a team as code owners November 19, 2025 16:44
0xnm
0xnm previously approved these changes Nov 20, 2025
}.onFailure { ex ->
internalLogger.log(
level = InternalLogger.Level.WARN,
targets = listOf(InternalLogger.Target.MAINTAINER, InternalLogger.Target.TELEMETRY),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit afraid that adding TELEMETRY target here may cause a recursion loop. Yes, we have onlyOnce = true, but can you double-check that no time info is gathered between log call is made and onlyOnce check is actually hit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for mentioning it, I haven't thought about this situation.

I checked that:

  1. Yes, recursion indeed happens because here when DatadogContext is being created it calls timeProvider.getServerTimestamp.
  2. But onlyOnce helps, I observe that this line gets executed when the logging is called on the second iteration of "recursion".

So looks like everything is fine here.

Comment thread detekt_custom_safe_calls.yml
jonathanmos
jonathanmos previously approved these changes Nov 20, 2025
ambushwork
ambushwork previously approved these changes Nov 20, 2025
@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented Nov 20, 2025

Copy link
Copy Markdown

🎯 Code Coverage
Patch Coverage: 100.00%
Total Coverage: 71.35% (+0.04%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 7cffdcd | Docs | Datadog PR Page | Was this helpful? Give us feedback!

0xnm
0xnm previously approved these changes Nov 20, 2025
jonathanmos
jonathanmos previously approved these changes Nov 20, 2025
@aleksandr-gringauz aleksandr-gringauz merged commit f786365 into develop Nov 20, 2025
26 checks passed
@aleksandr-gringauz aleksandr-gringauz deleted the aleksandr-gringauz/RUM-11821/fix-kronos-crash branch November 20, 2025 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants