Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

PermsSyncer: report exact state of each provider in each job#44257

Merged
bobheadxi merged 4 commits into
mainfrom
11-10-PermsSyncer_report_exact_state_of_each_provider_in_each_job
Nov 15, 2022
Merged

PermsSyncer: report exact state of each provider in each job#44257
bobheadxi merged 4 commits into
mainfrom
11-10-PermsSyncer_report_exact_state_of_each_provider_in_each_job

Conversation

@bobheadxi

@bobheadxi bobheadxi commented Nov 11, 2022

Copy link
Copy Markdown
Member

Right now there are three ways to try and determine if an authz/authorization provider is happy, each with caveats:

  • through metrics and logs exported by each individual provider implementation. This is very inconsistent, up to each implementation, and difficult to programmatically assess
  • through PermsSyncer job results, which only reports the overall state of a sync job run in a single error. It sometimes doesn't report any error from the provider at all, swallowing the provider's error with special handling
  • through PermsSyncer metrics and logs. This gets us ~almost there, but is difficult to programmatically assess, requires infrastructure access, and requires knowledge of how PermsSyncer is implemented.

This makes assessing the health of authz difficult when:

  • attempting to validate a particular authz provider is behaving correctly - e.g. when restoring from a backup, we want to be able to select a particular provider to ensure permissions sync jobs are correctly involving the provider and succeeding
  • hypothetically, when multiple authz providers are configured

This change propagates the "real" state of each individual provider from the various runSync implementations, so that we can see the state of each provider involved in a permissions sync job. As a simple proof-of-concept, I've added a log field that lists the providers that failed when a job fails, and added some assertions to a few tests to demonstrate what the output looks like.

On top of this, we can then implement a cache that tracks recent permission sync job outcomes: #44258, which can then open the door to allowing us to render these results via the GraphQL API, similarly to the external service sync status API.

Part of https://github.com/sourcegraph/customer/issues/1534
Stacked on https://github.com/sourcegraph/sourcegraph/pull/44251

Test plan

Updated unit and integration tests pass

@cla-bot cla-bot Bot added the cla-signed label Nov 11, 2022
@bobheadxi

bobheadxi commented Nov 11, 2022

Copy link
Copy Markdown
Member Author

@unknwon unknwon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach does seem to make sense to me, will do a thorough review once marked as ready 👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any write to providerStates in this method. Do you plan to add it?

@bobheadxi bobheadxi Nov 14, 2022

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several writes, e.g. on line 843:

			providerStates = append(providerStates, providerState{

Comment thread enterprise/cmd/repo-updater/internal/authz/perms_syncer.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a log.Debug statement that would give us the provider states even if there is no error? E.g. something like "All providers succeeded"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

@bobheadxi bobheadxi force-pushed the authz-fetch-user-perms-results branch from 850a68d to 9d688b9 Compare November 14, 2022 18:37
@bobheadxi bobheadxi force-pushed the 11-10-PermsSyncer_report_exact_state_of_each_provider_in_each_job branch from aa65fb7 to 764cca8 Compare November 14, 2022 18:37
@bobheadxi bobheadxi marked this pull request as ready for review November 14, 2022 18:38
@sourcegraph-bot

sourcegraph-bot commented Nov 14, 2022

Copy link
Copy Markdown
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff 9322c58...e09ee8e.

Notify File(s)
@indradhanush enterprise/cmd/repo-updater/internal/authz/integration_test.go
enterprise/cmd/repo-updater/internal/authz/perms_syncer.go
enterprise/cmd/repo-updater/internal/authz/perms_syncer_test.go
enterprise/cmd/repo-updater/internal/authz/provider_state.go
@unknwon enterprise/cmd/repo-updater/internal/authz/integration_test.go
enterprise/cmd/repo-updater/internal/authz/perms_syncer.go
enterprise/cmd/repo-updater/internal/authz/perms_syncer_test.go
enterprise/cmd/repo-updater/internal/authz/provider_state.go

@bobheadxi bobheadxi requested a review from a team November 14, 2022 18:40
@bobheadxi bobheadxi force-pushed the 11-10-PermsSyncer_report_exact_state_of_each_provider_in_each_job branch from 764cca8 to 1e27ce1 Compare November 14, 2022 19:45
@bobheadxi bobheadxi force-pushed the authz-fetch-user-perms-results branch from 9d688b9 to 1b5569d Compare November 14, 2022 23:40
@bobheadxi bobheadxi force-pushed the 11-10-PermsSyncer_report_exact_state_of_each_provider_in_each_job branch from 638f992 to 63c8511 Compare November 14, 2022 23:40
Comment on lines 16 to 17

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just declare a const type ProviderState, and use the type everywhere ProviderStateError/ProviderStateSuccess

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need one for the job state as well, which feels a bit duplicative, and I'm not sure if there's a naming that would make this suitable for both 🤔

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave this out for now: setting and checking are both internal implementation details, there are no consumers of this yet. Can revisit in a follow-up if the need arises!

Comment thread enterprise/cmd/repo-updater/internal/authz/provider_state.go Outdated
@bobheadxi bobheadxi force-pushed the 11-10-PermsSyncer_report_exact_state_of_each_provider_in_each_job branch from 18c457a to c408d9c Compare November 15, 2022 16:45
Base automatically changed from authz-fetch-user-perms-results to main November 15, 2022 17:45
@bobheadxi bobheadxi force-pushed the 11-10-PermsSyncer_report_exact_state_of_each_provider_in_each_job branch from c408d9c to e09ee8e Compare November 15, 2022 17:46
@bobheadxi bobheadxi enabled auto-merge (squash) November 15, 2022 17:55
@bobheadxi bobheadxi merged commit aa8b47e into main Nov 15, 2022
@bobheadxi bobheadxi deleted the 11-10-PermsSyncer_report_exact_state_of_each_provider_in_each_job branch November 15, 2022 17:59
bobheadxi added a commit that referenced this pull request Nov 16, 2022
Tracks perms sync job outcomes, notably per-provider states (#44257), in Redis. We set an aggressive TTL of 5 minutes by default because in practice, only the details for the last few sync jobs are relevant, and this prevents us from putting too much pressure on Redis on larger instances.

This can be extended in the future to be database-backed, but for now I think Redis is the right choice - the volume of entries can be quite high when accumulated so we want something easy to expire, and realistically only the details for the last few sync jobs are relevant. Additionally, the records store and reader in package synclogs can easily be rewritten without too much impact to write to a database in the future.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants