Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

usage-data: push scraped events to pubsub if telemetry job is enabled#39389

Merged
coury-clark merged 14 commits into
mainfrom
usage-data/push-to-pubsub
Jul 26, 2022
Merged

usage-data: push scraped events to pubsub if telemetry job is enabled#39389
coury-clark merged 14 commits into
mainfrom
usage-data/push-to-pubsub

Conversation

@coury-clark

@coury-clark coury-clark commented Jul 25, 2022

Copy link
Copy Markdown
Contributor

Closes https://github.com/sourcegraph/sourcegraph/issues/39088

  1. Push events that are scraped to pubsub.
  2. Adds site settings to control the project / topic

Note: If you inspect the pubsub topic below you will see a variety of message formats. There is still some uncertainty in the format, so for now we won't worry so much that the schema is exactly correct in the destination.

Test plan

To run / test locally:

Using credentials: https://start.1password.com/open/i?a=HEDEDSLHPBFGRBTKAKJWE23XX4&h=team-sourcegraph.1password.com&i=fpusirhqcoi744acjtowvzfixq&v=dnrhbauihkhjs5ag6vszsme45a

Copy that file to a location, and then start sg with

GOOGLE_APPLICATION_CREDENTIALS="path/to/file.json" sg start...

Add to your site config

  "exportUsageTelemetry": {
    "enabled": true,
    "topicProjectName": "sourcegraph-dogfood",
    "topicName": "usage-data-testing"
  },

A variety of events can be found here: https://console.cloud.google.com/cloudpubsub/topic/detail/usage-data-testing?authuser=0&project=sourcegraph-dogfood&tab=messages from pushing locally.

coury-clark and others added 12 commits July 25, 2022 09:52
Instrumentation showed that these endpoints aren't being used and it
looks like there weren't even any handlers attached to the routes.

Confirmed that they were removed a long time ago:
https://sourcegraph.sourcegraph.com/github.com/sourcegraph/sourcegraph/-/commit/8572e643fa4a0a024a4be7d2441a45649800368f
Finally fixing this after it made me say "ah! oh no" one too many times
in the past few weeks.

Here's what previously happened on THORSTEN & SG GENERATE:

1. Thorsten has a custom commandset named `horsegraph` in `sg.config.overwrite.yaml`, along with some other custom commands.
2. Thorsten creates a database migration/GraphQL schema addition/...
3. Thorsten runs `sg generate`
4. Thorsten commits and pushes commit
5. Thorsten sees that he pushed commit in which `sg`'s reference in the
   documentation now contains `"horsegraph"` as an official commandset
   to be used with `sg start`
6. Thorsten says "ah! oh no" and undoes changes

... multiple times.

So what this does here is introduce a `disable-overwrite` flag that
causes only the standard config to be read.

It's then used in the `go:generate` directive that runs `sg help`.
@cla-bot cla-bot Bot added the cla-signed label Jul 25, 2022
Comment thread enterprise/cmd/worker/internal/telemetry/telemetry_job.go Outdated
@coury-clark coury-clark requested a review from a team July 25, 2022 21:40
@coury-clark coury-clark marked this pull request as ready for review July 25, 2022 21:40
@sourcegraph-bot

sourcegraph-bot commented Jul 25, 2022

Copy link
Copy Markdown
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff 031f22f...9932595.

Notify File(s)
@efritz enterprise/cmd/worker/internal/telemetry/telemetry_job.go
enterprise/cmd/worker/internal/telemetry/telemetry_job_test.go

@coury-clark coury-clark changed the title Usage data/push to pubsub usage-data: push scraped events to pubsub if telemetry job is enabled Jul 25, 2022

@chwarwick chwarwick left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts in comments but nothing blocking.

An easier local testing setup may be to use gcloud and run gcloud auth application-default login This way it runs within your users permissions and don't need to share keys.

I assume this going to be a cross project publish when it runs in production. There will probably need to be some additional considerations that will need to be made in the topic config because I anticipate there are going be issues using the default client credential locations to specify creds to a different project (mainly that they should get picked up by any other running process).

Comment thread enterprise/cmd/worker/internal/telemetry/telemetry_job.go
Comment thread enterprise/cmd/worker/internal/telemetry/telemetry_job.go
@coury-clark

Copy link
Copy Markdown
Contributor Author

I assume this going to be a cross project publish when it runs in production. There will probably need to be some additional considerations that will need to be made in the topic config because I anticipate there are going be issues using the default client credential locations to specify creds to a different project (mainly that they should get picked up by any other running process).

In the MI cloud they will be using workload identity, these credentials are just for testing locally. Thanks for pointing out gcloud auth application-default login though, I wasn't aware of that and will give it a try.

@coury-clark coury-clark merged commit 70b8f57 into main Jul 26, 2022
@coury-clark coury-clark deleted the usage-data/push-to-pubsub branch July 26, 2022 16:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

usage-data: sanitize and publish scraped events to pub/sub

6 participants