Simplify AnalyticsService code#2308
Simplify AnalyticsService code#2308benhalpern merged 13 commits intoforem:masterfrom rhymes:rhymes/simplify-analytics
Conversation
|
@Zhao-Andy I'm pinging you here (let me know what you think 😀) |
|
Thanks for the refactor! One concern I had though with using SQL queries was that it would be a lot of queries if we're covering a timeframe of |
|
@Zhao-Andy you're right, that could be a concern. If the requester asks for a huge range of dates that could mean a lot of queries but it's still inside the range of request dates, so there's no risk of querying millions of records in theory. Also, we can add indexes in the future to speed up queries (for example all those tables don't have indexes on created_at which I'd consider adding in both cases) One could benchmark the two versions in terms of speed and memory and see what happens but you'd need a realistic DB (and a bit of spare time) to do that and I can't with the development DB There's a third option, which is to see if we can re-organize some of the queries to take advantage of What do you think about the other points I've left out? |
Co-Authored-By: rhymes <rhymesete@gmail.com>
We might want to add a hard limit to how large the date range can be. Twitter's API only allows up to requesting 4 weeks at a time. A 30 day period would mean 270 queries, but I really don't know enough about performance to say whether or not this is better/worse than memory. :(
I think adding indexes would make sense.
I might try this; I'll let you know how it goes if I do.
I actually used The issue with it was that it was leaving out dates that were had no data. I guess we can solve that with setting a default value of sorts before grouping. Personally leaning toward an implementation that uses SQL + |
|
Quick idea: make a model/table to store the data by day/week and only calculate for new data since the old data will stay the same. We can use |
Oh, that's a good idea. Maybe in a separate PR though
I'd add these in another PR as well, first I would merge this and see if we still have those timeouts you mentioned in #2307. One step at the time, performance improvement should always have some kind of benchmarking or self evident reason behind them eheh
This is group by for an enumerable, which requires objects to be all in memory though. The group by I was referring to was SQL/ActiveRecord's: https://guides.rubyonrails.org/active_record_querying.html#group But once again, I would do this only if and when needed
ahhahaha great pun :D |
Hi @aspittel I have a question: is it still slow with this PR? I can't see it from my end. Because going from real time to batching requires a lot more code and coordination (as you said you need a new model and recurring jobs to update the table/view for each pro user and to calculate the "diff") |
|
Everything has been fast for me so far, my concern is that since we're using this for a data dashboard with presumably a lot of weeks on it, we may want to consider that down the road! Think this is great for now, but we may want to think about further optimizations in the future! |
|
It's not crystal clear what the right answer is but I'm down to merge this. I think @aspittel's serialization ideas make sense but as @rhymes noted, it's a lot of new code! What if we make use of low-level/Russion-doll Rails caching wherever possible to solve these problems for now. And once the API stabilizes and our bigger concerns are perf/scaling, we can look to solve those problems then. |
|
No, it's not clear what the right answer is, there's a little of trial and error to go through. A summary of possible improvements for posterity:
I think that's all that has been proposed in this discussion. Thanks for the merge! |
|
@lightalloy I collected all possible improvements discussed and my thoughts here #2311 (it's a longish read :D) |
What type of PR is this? (check all applicable)
Description
This refactoring does mainly three things:
AnalyticsServiceinterfaceA few things I've left out that should be discussed/decided:
AnalyticsServiceshould probably be renamed toAnalytics::Proas suggested by @lightalloydate.strftime("%a, %m/%d")is locale dependent and potentially repeats itself (if start_date and end_date go past a year there could be dates with the same string version). Also, users outside the US could misinterpret dates as "03/04" (3rd of march or 4th of april?). I suggest using the ISO "2019-04-01" as key for the hash.Related Tickets & Documents
Closes #2307
Mobile & Desktop Screenshots/Recordings (if there are UI changes)
Added to documentation?