Conversation
Generated by 🚫 Danger |
|
📲 You can test the changes from this Pull Request in WooCommerce Android by scanning the QR code below to install the corresponding build.
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## trunk #11293 +/- ##
============================================
- Coverage 41.40% 41.39% -0.02%
Complexity 5217 5217
============================================
Files 1057 1057
Lines 61889 61911 +22
Branches 8450 8452 +2
============================================
Hits 25627 25627
- Misses 33959 33981 +22
Partials 2303 2303 ☔ View full report in Codecov by Sentry. |
hichamboushaba
left a comment
There was a problem hiding this comment.
@kidinov sorry for the delay, I was AFK last week.
I have some concerns about this, switching to a Coroutine here will cause the code to be dispatched to a different thread (because of the usage of the default Dispatchers.Default), which means two issues:
- The event timestamp will be slightly behind the actual tracking, this would be in the order of tens or hundreds of milliseconds depending on the load of the
Dispatchers.Defaultqueue. - The ordering of events won't be guaranteed anymore.
Regarding 2, I googled a bit, and it seems it's the case, the official guarantee for coroutines is that it's almost FIFO, so not exactly FIFO, and I was able to reproduce it by adding some logging:
Index: WooCommerce/src/main/kotlin/com/woocommerce/android/analytics/AnalyticsTracker.kt
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/WooCommerce/src/main/kotlin/com/woocommerce/android/analytics/AnalyticsTracker.kt b/WooCommerce/src/main/kotlin/com/woocommerce/android/analytics/AnalyticsTracker.kt
--- a/WooCommerce/src/main/kotlin/com/woocommerce/android/analytics/AnalyticsTracker.kt (revision 1ee5b1704eb57f0098c441d555620d24e039b131)
+++ b/WooCommerce/src/main/kotlin/com/woocommerce/android/analytics/AnalyticsTracker.kt (date 1713181146718)
@@ -68,6 +68,7 @@
}
private fun track(stat: AnalyticsEvent, properties: Map<String, *>) {
+ WooLog.i(T.UTILS, "Tracking ${stat.name.lowercase()}, Properties: $properties")
appCoroutineScope.launch {
if (tracksClient == null) {
return@launchWhich resulted in the following:
2024-04-15 12:39:47.565 11021-11021 WooCommerce-UTILS com.woocommerce.android.dev I Tracking app_feedback_prompt, Properties: {action=shown}
2024-04-15 12:39:47.565 11021-11021 WooCommerce-UTILS com.woocommerce.android.dev I Tracking view_shown, Properties: {name=MyStoreFragment}
2024-04-15 12:39:47.568 11021-11193 WooCommerce-UTILS com.woocommerce.android.dev I 🔵 Tracked: view_shown, Properties: {"name":"MyStoreFragment","is_debug":true,"site_url":"https:\/\/hichamwootest.blog","cached_woo_core_version":"8.7.0"}
2024-04-15 12:39:47.568 11021-11192 WooCommerce-UTILS com.woocommerce.android.dev I 🔵 Tracked: app_feedback_prompt, Properties: {"action":"shown","blog_id":226227340,"is_wpcom_store":true,"was_ecommerce_trial":true,"plan_product_slug":"wooexpress-small-bundle-yearly","store_id":"1d798d53-bded-499f-b0ef-a2a7a312f35d","is_debug":true,"site_url":"https:\/\/hichamwootest.blog","cached_woo_core_version":"8.7.0"}
Notice that view_shown was logged before app_feedback_prompt even though they were requested in a different order.
I get that the issue here is that we added a DB call when logging the WC version in GetWooCorePluginCachedVersion, if I'm right in this assumption, then I suggest we cache the version in memory and use it instead of using the DB for all events, this would also offer a better performance due to the frequency of Tracks.
Since we fetch the site's plugins on app launch and when switching sites here, it would be a good candidate for when caching the value 🤔, WDYT?
|
Thank you for the really good points here! 🙏
Yep, this is correct. But I'd still really want to fix that in a generic way, as we hit the IO not only in Overall, we should strive to do as little as possible work on the main thread, and that looks like a relatively simple way of improving the situation with the app being jerky.
That is cached on the backend itself, and then we cache it in the DB, too. It's already quite challenging to make sure that we track the relevant value, and we actually already know that the value won't be relevant all the time. Introducing another level of cache will create yet another level of ambiguity and complexity, which I'd like to avoid.
Using the local time timestamp may not be accurate because the time on the device can be set to any value. Additionally, setting the time on the main thread does not guarantee that we record the actual event time since we record the time when the main thread managed to hit
This sounds like a more important issue to me, as tracks may be used for debugging sometimes. Having said that, we already do not really have a guaranteed order, as we sometimes call that from inside coroutines. I added mutex to mitigate that c464f20 |
|
Thanks @kidinov for the remarks, one suggestion that might work without the issues of the current solution is to launch the coroutine on a dispatcher that uses a single thread (newSingleThreadContext), or even alternatively not using coroutines at all, and instead using a separate thread that handles the logging of the events. I think that by forcing a single background thread, we'll avoid the issues that I shared above, but I didn't test it to confirm. |
|
Do you mean to have a constantly running thread that reads from some kind of queue of events? Does that solve the timing issue? Regarding using a native thread. What advantages do you see? To me, it sounds like:
The timing issue could be solved by recording the time before putting it to any parallel running code, but to do that, we will have to change the tracks library; as of now, it sets time inside. And as I mentioned above I don't think that this is something we should worry about As I realized that the mutex approach doesn't give us the guarantee of sequential/ordered tracking, I changed that to producer-consumer with the channel in between. Wdyt? |
hichamboushaba
left a comment
There was a problem hiding this comment.
Do you mean to have a constantly running thread that reads from some kind of queue of events? Does that solve the timing issue?
Yes, it should fix the ordering issue, as it'll handle events one by one, which means no parallelism.
The timestamp would still be impacted, but it's the same for all solutions that involve some background work.
Regarding using a native thread. What advantages do you see?
I didn't say it's better, I offered it just as a similar suggestion, as it's exactly similar to the single-threaded dispatcher with Coroutines, just with different API.
We need to stop that somehow to not leak
Why? the AnalyticsTracker runs for the whole duration of the app, same as AppCoroutineScope that we don't have to cancel.
That is just way heavier than coroutine in terms of resources
I don't think so, Coroutines are more lightweight than threads when we talk about a big number of them, but for this case, we'll create a Thread for either the single-threaded dispatcher or this solution, so actually the native thread solution might be lighter as it doesn't involve any additional Coroutines logic.
And as I mentioned above I don't think that this is something we should worry about
I agree, it's a minor issue, the ordering is the most serious issue.
As I realized that the mutex approach doesn't give us the guarantee of sequential/ordered tracking, I changed that to producer-consumer with the channel in between. Wdyt?
Yes, this solution looks better, thanks for working on it, as I didn't test the changes, I'll wait for @JorgeMucientes to review the PR as well.
WooCommerce/src/main/kotlin/com/woocommerce/android/analytics/AnalyticsTracker.kt
Outdated
Show resolved
Hide resolved
|
Hey @kidinov thanks for working on this! Great job 🥇 I just left a minor suggestion. |
|
hi there @kidinov 👋 I've spotted this PR and have a question:
Could you please point me to the line you mean? I think I can't spot it in |
…AnalyticsTracker.kt Co-authored-by: Jorge Mucientes <jorge.mucientes@automattic.com>
|
@wzieba 👋
Yeah. We hit which is IO. SharedPreferences is just an XML file on disk Also, the site is stored in DB, but we cache that internally in memory, but the first hit will be IO still here |
Just pointing out something here 🙂, I think it's generally OK to read the SharedPreferences on main thread, once they are initialized, they are cached in-memory, the original issue with them was when writing with |
|
@kidinov Have you noticed or measured any improvements related to this change? I'm concerned that this PR makes |
|
@wzieba 👋 I have measured that not in an exact manner, though, by just recording time before/after method execution. That sometimes takes 2ms, sometimes 14, sometimes 40. On some screens, we track a few events at the same time, so overall lag can be noticeable and that adds up to the whole experience of the app being quite jerky
If we do it in the background, I don't see why it would be misused. Seems legit use to me
We already call not from the main thread in quite a few places. And I don't add any concurrency - it's called one-by-one, the same as before, just without using the main thread
Yep, It is cached in the memory, but at the same time, there is some blocking synchronization going on, too. While it may be "ok" to access it from the main thread occasionally, here we do it 3 or more times per user's action, right after this action, which in my opinion is not necessary and better to avoid Having said that if you have a specific issue you see, I am totally fine with fixing that |
|
Thanks @kidinov for the comment.
For my learning, can you please elaborate on this? It's not clear to me what's meants by a blocking synchronization.
Just to confirm before @wzieba can share his opinion, were these measurements done with the DB call for getting the WooCommere version or without it? If yes, then I think a more faire approach would be to measure with the value cached in memory. |
Every access method by your link has
That lock seems to be acquired not only on read access, but also on writes to memory and to disk. There is also 'startReloadIfChangedUnexpectedly` method which is public so not clear when it's called but it also locks all the read access (don't want to spend time on researching when that happens, tbh) So basically, that shared lock is for the whole Shared Pref, not by key. So if there is any access to any key in shared pref, then we are waiting for them to be finished.
I measured without DB access |
Thanks so much @kidinov for the explanation 🙏, I checked the code, and it seems the lock In all cases, this was just for my learning and I appreciated, while when it comes to the changes of this PR, I'm OK with whatever approach, and I think the concern raised by @wzieba is a valid one that should be discussed. |
|
@kidinov @hichamboushaba thanks for sharing more context and leaving thoughts!
I'd discourage to pursuing ideas of having db queries for each tracking method invocations, especially, as you say a few of them might be happening at the same time. I just think it's a inefficient use of resources that might, on a bigger scale (in terms of access frequency), slow down the app (database is at the end, a single resource that has to be locked and unlocked).
What I meant is that the Tracks SDK is designed to be synchronous to save consumers from problems like the one we're addressing here. I'd argue that "we double the effort" rather than "we shouldn't do this because a side efect will happen". To better understand the problem we discuss here, I've deep dived into profiles we collect via Sentry Performance Monitoring. I've researched for outliers that could emphesize any issues. And e.g., let's take this profile: https://a8c.sentry.io/profiling/profile/woocommerce-android/010df37d5406486f86ff194484b78b61/flamegraph/?colorCoding=by+system+frame&fov=2945841152%2C19%2C1205526016%2C17&query=AnalyticsTracker.track&sorting=call+order&tid=17511&view=top+down and look for Going deeper into the execution tree, we can see that at the end of it
Looking at the source code of the Tracks SDK, I can see that, even the SDK was designed to be synchronous, before adding an event to a queue it performs multiple operations which can be time-consuming. I believe this PR addresses this problem, so we should experience the improvement. That's a good news 😃 . But generally, whenever we'll address those performance problems in Tracks SDK, I'd suggest reconsidering introducing this change and validate if logging of main thread is still necessary. Thanks for discussion @kidinov @hichamboushaba, I've learned more thanks to it 🙌 |
|
Thanks for the excellent investigation @wzieba, and thanks @kidinov and @hichamboushaba for keeping the discussion going. I'm learning so much from it 👏🏼 🥇 @wzieba really nice find there, on what tasks running inside Tracks lib are really time consuming. TIL about the Sentry profiles. |
|
Thanks!
This is a good question @JorgeMucientes ! For now, I was only able to see average time of specific methods by summaries on the main page (https://a8c.sentry.io/profiling/?project=1459556&statsPeriod=90d) From it we can see P75 of
I think it is indeed anecdotical. In fact we profile 1% of 2% of transactions events, which is I guess far from statistical sigificance. But the fact that |
|
I found this view: https://a8c.sentry.io/profiling/summary/woocommerce-android/?environment=release&project=1459556&statsPeriod=90d&transaction=OrderDetails&view=flamegraph that shows "Aggregated duration of this frame across different samples". On it we can see that some |
|
Really nice find @wzieba, thank you so much.
➕ to this. |





Closes: #11292
Description
While working on another issue, I noticed that in many cases, we call the "track" method on the main thread. Inside this, we hit the device's IO. This can make the app less responsive and jerky. The PR wraps this code in a courtines with
appCoroutineScopescope. As this potentially dangerous change, @hichamboushaba @JorgeMucientes, would you mind thinking of potential issues with this approach?Testing instructions
RELEASE-NOTES.txtif necessary. Use the "[Internal]" label for non-user-facing changes.