Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

[Backport 5.1] Schedule embeddings jobs will not roll back transaction on read errors#54804

Merged
coury-clark merged 1 commit into
5.1from
backport-54701-to-5.1
Jul 11, 2023
Merged

[Backport 5.1] Schedule embeddings jobs will not roll back transaction on read errors#54804
coury-clark merged 1 commit into
5.1from
backport-54701-to-5.1

Conversation

@github-actions

Copy link
Copy Markdown
Contributor

When a set of repositories are to be scheduled for embeddings (via graphql or a policy) an error with a single repository can cause the entire set of embeddings jobs to be rolled back. The specific error that occurred for a recent customer issue (slack discussion) was due to one of the repos being empty and causing all 500 repos matched by a policy to not be enqueued for embeddings jobs. Not only do all repos have their embeddings jobs rolled back but additionally the logging will only report one problematic repo before exiting the scheduling logic which makes it so that the log does not capture all of the problematic repos in one attempt.

We should instead allow read errors (e.g. reading sql or fetching information from gitserver) to be handled but not prevent the entire batch of jobs to be rolled back. This will allow a policy to be partially impactful across the repositories it is concerned with. Especially given that an empty repo won't have any need to be enqueued for an embeddings job.

This PR changes the error handling so that repos that cannot complete an embeddings job will still be enqueued with a failed state so that site admin pages can still display which individual repos are problematic. When a repo is enqueued with a failed state we do not roll back the transaction so that successful embeddings jobs are still created with a queued state.

Test plan

will add test case to either scheduling logic tests or job execution logic tests (depends on implementation)

<!-- All pull requests REQUIRE a test plan: https://docs.sourcegraph.com/dev/background-information/testing_principles -->

Backport 57146ce from #54701

#54701)

When a set of repositories are to be scheduled for embeddings (via
graphql or a policy) an error with a single repository can cause the
entire set of embeddings jobs to be rolled back. The specific error that
occurred for a recent customer issue ([slack
discussion](https://sourcegraph.slack.com/archives/C053L1AQ0BC/p1688671109829779))
was due to one of the [repos being
empty](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/internal/embeddings/schedule.go?L36-42)
and causing all 500 repos matched by a policy to not be enqueued for
embeddings jobs. Not only do all repos have their embeddings jobs rolled
back but additionally the logging will only report one problematic repo
before exiting the scheduling logic which makes it so that the log does
not capture all of the problematic repos in one attempt.

We should instead allow read errors (e.g. reading sql or fetching
information from gitserver) to be handled but not prevent the entire
batch of jobs to be rolled back. This will allow a policy to be
partially impactful across the repositories it is concerned with.
Especially given that an empty repo won't have any need to be enqueued
for an embeddings job.

This PR changes the error handling so that repos that cannot complete an
embeddings job will still be enqueued with a `failed` state so that site
admin pages can still display which individual repos are problematic.
When a repo is enqueued with a `failed` state we do not roll back the
transaction so that successful embeddings jobs are still created with a
`queued` state.

## Test plan

will add test case to either scheduling logic tests or job execution
logic tests (depends on implementation)

<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->

(cherry picked from commit 57146ce)
@sourcegraph-bot

Copy link
Copy Markdown
Contributor

📖 Storybook live preview

@coury-clark coury-clark merged commit d01cf0c into 5.1 Jul 11, 2023
@coury-clark coury-clark deleted the backport-54701-to-5.1 branch July 11, 2023 20:15
davejrt referenced this pull request Jul 12, 2023
…ty revision string (#54879)

This [PR](https://github.com/sourcegraph/sourcegraph/pull/54804) fixed
an issue with job scheduling/execution in the backend. The interim fix
of this PR relies on allowing for empty string `revision` values written
with a repo embedding job when the repo is empty or some other issue
occurs when fetching the repo's default branch (e.g. empty repo).

Empty revision value causes issues with graphql query repoEmbeddingJobs
when [querying for
Revision](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/enterprise/site-admin/cody/backend.ts?L32-35)
as our site admin jobs panel does.

`Panic occurred: runtime error: slice bounds out of range [:7] with
length 0`

This PR updates the embedding job resolver to consider empty revision
string acceptable and not call constructor for git commit resolver. Also
site admin jobs list will now show repo name if repo is non-null and
revision is null.

## Test plan

<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->

sg manual test
gl-srgr referenced this pull request Jul 12, 2023
…ty revision string (#54879)

This [PR](https://github.com/sourcegraph/sourcegraph/pull/54804) fixed
an issue with job scheduling/execution in the backend. The interim fix
of this PR relies on allowing for empty string `revision` values written
with a repo embedding job when the repo is empty or some other issue
occurs when fetching the repo's default branch (e.g. empty repo).

Empty revision value causes issues with graphql query repoEmbeddingJobs
when [querying for
Revision](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/enterprise/site-admin/cody/backend.ts?L32-35)
as our site admin jobs panel does.

`Panic occurred: runtime error: slice bounds out of range [:7] with
length 0`

This PR updates the embedding job resolver to consider empty revision
string acceptable and not call constructor for git commit resolver. Also
site admin jobs list will now show repo name if repo is non-null and
revision is null.

<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->

sg manual test

(cherry picked from commit 1dd6db4)
camdencheek referenced this pull request Jul 12, 2023
…ty revision string (#54879)

This [PR](https://github.com/sourcegraph/sourcegraph/pull/54804) fixed
an issue with job scheduling/execution in the backend. The interim fix
of this PR relies on allowing for empty string `revision` values written
with a repo embedding job when the repo is empty or some other issue
occurs when fetching the repo's default branch (e.g. empty repo).

Empty revision value causes issues with graphql query repoEmbeddingJobs
when [querying for
Revision](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/enterprise/site-admin/cody/backend.ts?L32-35)
as our site admin jobs panel does.

`Panic occurred: runtime error: slice bounds out of range [:7] with
length 0`

This PR updates the embedding job resolver to consider empty revision
string acceptable and not call constructor for git commit resolver. Also
site admin jobs list will now show repo name if repo is non-null and
revision is null.

<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->

sg manual test

(cherry picked from commit 1dd6db4)
camdencheek referenced this pull request Jul 12, 2023
…ddings job's empty revision string (#54881)

This [PR](https://github.com/sourcegraph/sourcegraph/pull/54804) fixed
an issue with job scheduling/execution in the backend. The interim fix
of this PR relies on allowing for empty string `revision` values written
with a repo embedding job when the repo is empty or some other issue
occurs when fetching the repo's default branch (e.g. empty repo).

Empty revision value causes issues with graphql query repoEmbeddingJobs
when [querying for

Revision](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/enterprise/site-admin/cody/backend.ts?L32-35)
as our site admin jobs panel does.

`Panic occurred: runtime error: slice bounds out of range [:7] with
length 0`

This PR updates the embedding job resolver to consider empty revision
string acceptable and not call constructor for git commit resolver. Also
site admin jobs list will now show repo name if repo is non-null and
revision is null.

Backport of https://github.com/sourcegraph/sourcegraph/pull/54879
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants