Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Eliminate usage stats deprecated package#40674

Merged
dadlerj merged 11 commits into
mainfrom
eliminate_usage_stats_deprecated_package-gh
Aug 24, 2022
Merged

Eliminate usage stats deprecated package#40674
dadlerj merged 11 commits into
mainfrom
eliminate_usage_stats_deprecated_package-gh

Conversation

@dadlerj

@dadlerj dadlerj commented Aug 22, 2022

Copy link
Copy Markdown
Member

Revisiting https://github.com/sourcegraph/sourcegraph/pull/38826

That PR had to be reverted due to slow performance and timeouts on the admin analytics page on dotcom. After testing locally with >30 million events in my local db, I could reproduce it (generating the admin chart took ~3 minutes).

The implementation relied on a SQL query that took ~20-30 seconds, which was called 9 times. I've slimmed it down to just one query that takes about the same amount of time (~30s).

First, I removed the integration user data—it was never actually used in any of the calls to this method.

Second, I changed the SQL to collect both registered and anon user data in a single pass.

Finally, I changed the SQL to aggregate users into DAU, WAU, and MAU buckets in a single pass as well.

I borrowed from the SiteUsage query (used for ping data aggregation), which is largely duplicative with this one, but is missing one key feature: It only allows you to pull the current DAUs, WAUs, and MAUs. If you want, e.g., "the last 3 months" of MAUs, you have to use this query/API call instead. I updated the naming of both functions to make it clear that they are related.

For reviewers, fell free to just look at the last four commits. The first six are identical to the first PR (just rebased).

Test plan

Tested locally

@dadlerj dadlerj requested review from efritz and unknwon August 22, 2022 14:37
@cla-bot cla-bot Bot added the cla-signed label Aug 22, 2022
@sourcegraph-bot

sourcegraph-bot commented Aug 22, 2022

Copy link
Copy Markdown
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff bdb0a0d...98e341f.

Notify File(s)
@philipp-spiess client/jetbrains/webview/src/telemetry/EventLogger.ts
@vdavid client/jetbrains/webview/src/telemetry/EventLogger.ts

Comment thread internal/usagestats/usage_stats.go Outdated
Comment thread internal/database/event_logs_test.go Outdated
dadlerj and others added 9 commits August 22, 2022 10:15
…stats package instead; add option to unique user count queries to exclude system/backend users
Co-authored-by: Erik Seliger <erikseliger@me.com>
…Usage -> SiteUsageCurrentPeriods; reduce number of slow SQL queries required to get usage stats for admin panel; update mocks and tests
@dadlerj dadlerj force-pushed the eliminate_usage_stats_deprecated_package-gh branch from 495e6e9 to aa6de8a Compare August 22, 2022 17:15
Comment on lines +658 to +703
FROM event_logs
WHERE (%s)
GROUP BY day_period, week_period, month_period, aggregated_user_id, registered
),
unique_users_by_day AS (
SELECT
day_period,
COUNT(DISTINCT aggregated_user_id) as count,
COUNT(DISTINCT aggregated_user_id) FILTER (WHERE registered) as count_registered
FROM unique_users_by_dwm
GROUP BY day_period
),
unique_users_by_week AS (
SELECT
week_period,
COUNT(DISTINCT aggregated_user_id) as count,
COUNT(DISTINCT aggregated_user_id) FILTER (WHERE registered) as count_registered
FROM unique_users_by_dwm
GROUP BY week_period
),
unique_users_by_month AS (
SELECT
month_period,
COUNT(DISTINCT aggregated_user_id) as count,
COUNT(DISTINCT aggregated_user_id) FILTER (WHERE registered) as count_registered
FROM unique_users_by_dwm
GROUP BY month_period
)
SELECT
all_periods.period,
all_periods.type,
COALESCE(CASE WHEN all_periods.type = 'day'
THEN unique_users_by_day.count
ELSE CASE WHEN all_periods.type = 'week'
THEN unique_users_by_week.count
ELSE unique_users_by_month.count
END
END, 0) count,
COALESCE(CASE WHEN all_periods.type = 'day'
THEN unique_users_by_day.count_registered
ELSE CASE WHEN all_periods.type = 'week'
THEN unique_users_by_week.count_registered
ELSE unique_users_by_month.count_registered
END
END, 0) count_registered
FROM all_periods

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have quite a bit of mixed uses of spaces and tabs in this string, just pick one or the other 😁

CleanShot 2022-08-24 at 13 24 20@2x

@dadlerj dadlerj merged commit 0f23e85 into main Aug 24, 2022
@dadlerj dadlerj deleted the eliminate_usage_stats_deprecated_package-gh branch August 24, 2022 22:10
mrnugget added a commit that referenced this pull request Oct 28, 2022
…overview page (#43591)

This removes the "search code, find ref, search symbol" onboarding checklist from

- global nav bar
- users profile page
- admin analytics overview page

## Why?

Customer reported severe impact on database performance after upgrading from 3.43.2 to 4.0.1. The reason for the performance regression is a change made in #40674: the `user { usageStats }` GraphQL query would now hit PostgreSQL instead of Redis to get the usage stats.

Here is the code path in 3.43.2 for `user { usageStats }`: https://github.com/sourcegraph/sourcegraph/blob/fd7599b9cbf3e80334a6167763b3c74ee9cddf9c/cmd/frontend/internal/usagestatsdeprecated/usage_stats.go#L42-L43

Here it is in 4.0+ https://github.com/sourcegraph/sourcegraph/blob/41e19f207038b4822dc9ebb052eb857e3dae9103/internal/usagestats/usage_stats.go#L110-L136

The SQL queries run by the latter have bad performance because they do multiple queries over one of the biggest tables we have: `event_logs`. Some of the performance problems with that have been addressed in https://github.com/sourcegraph/sourcegraph/pull/43564 and https://github.com/sourcegraph/sourcegraph/pull/43582.

Big problem: **these queries are run on every page load, due to the `withActivation` wrapper we had around the `Layout` component.** That wrapper would load the `usageStatistics` for the currently-logged-in user on every page load to display that onboarding checklist at the top.

On the customer's instance this resulted in slow queries piling up in the database, pinning CPU usage at 100% and exhausting the database connection limits.

After discussing the issue in Slack, [we decided to remove this component altogether](https://sourcegraph.slack.com/archives/C03TGHH1S3V/p1666951569092379?thread_ts=1666747586.140109&cid=C03TGHH1S3V), since we deem the risk low: the component has been unchanged in 3 years, it's unloved and unowned, it has performance implications, and it most likely doesn't have any effect on user onboarding.
BolajiOlajide pushed a commit that referenced this pull request Oct 28, 2022
…overview page (#43591)

This removes the "search code, find ref, search symbol" onboarding checklist from

- global nav bar
- users profile page
- admin analytics overview page

Customer reported severe impact on database performance after upgrading from 3.43.2 to 4.0.1. The reason for the performance regression is a change made in #40674: the `user { usageStats }` GraphQL query would now hit PostgreSQL instead of Redis to get the usage stats.

Here is the code path in 3.43.2 for `user { usageStats }`: https://github.com/sourcegraph/sourcegraph/blob/fd7599b9cbf3e80334a6167763b3c74ee9cddf9c/cmd/frontend/internal/usagestatsdeprecated/usage_stats.go#L42-L43

Here it is in 4.0+ https://github.com/sourcegraph/sourcegraph/blob/41e19f207038b4822dc9ebb052eb857e3dae9103/internal/usagestats/usage_stats.go#L110-L136

The SQL queries run by the latter have bad performance because they do multiple queries over one of the biggest tables we have: `event_logs`. Some of the performance problems with that have been addressed in https://github.com/sourcegraph/sourcegraph/pull/43564 and https://github.com/sourcegraph/sourcegraph/pull/43582.

Big problem: **these queries are run on every page load, due to the `withActivation` wrapper we had around the `Layout` component.** That wrapper would load the `usageStatistics` for the currently-logged-in user on every page load to display that onboarding checklist at the top.

On the customer's instance this resulted in slow queries piling up in the database, pinning CPU usage at 100% and exhausting the database connection limits.

After discussing the issue in Slack, [we decided to remove this component altogether](https://sourcegraph.slack.com/archives/C03TGHH1S3V/p1666951569092379?thread_ts=1666747586.140109&cid=C03TGHH1S3V), since we deem the risk low: the component has been unchanged in 3 years, it's unloved and unowned, it has performance implications, and it most likely doesn't have any effect on user onboarding.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants