Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

search: support draining Zoekt instances#62005

Merged
stefanhengl merged 5 commits into
mainfrom
k/sh/search/drain-zoekt
Apr 18, 2024
Merged

search: support draining Zoekt instances#62005
stefanhengl merged 5 commits into
mainfrom
k/sh/search/drain-zoekt

Conversation

@keegancsmith

Copy link
Copy Markdown
Member

This is an alternative implementation to https://github.com/sourcegraph/sourcegraph/pull/61833

This PR implements support for draining a zoekt replica via including its hostname in the comma-separated environment variable INDEXED_SEARCH_DRAIN_SERVERS on sourcegraph-frontend.

The way this functionality is implemented is via adjusting the endpoint map we use when making assignment of repos. We still report the hostname as part of the list of endpoints. However, the endpoint is left out of the consistent hash which maps the repositories to endpoints.

Our interactions with zoekt are already designed to do smooth rebalancing when the set of endpoints changes. We have logic to only remove repos from a replica once its new endpoint has it, and we support deduplication of search results across endpoints.

Test Plan: Stefan manually tested it.

Co-authored-by: @stefanhengl

stefanhengl and others added 5 commits April 12, 2024 14:30
This change lets us set the ENV on frontend with the Zoekt endpoints we want
to drain. As a consequence all repos indexed by those instances will be
assigned to other instances. Search still targets all instances. After
an instance has been fully drained, we can remove it without user
impact.

Here is how the process would look like:

1. Set ENV on frontend as restart: INDEXED_SEARCH_DRAIN_SERVERS=<endpoint>
2. Monitor Zoekt dashboard to see when indexes have been migrated.
Depending on the number of repos, this can take hours or even days.
3. Remove empty instance
4. Unset ENV

Test plan:
- New unit test
- manual testing
@keegancsmith keegancsmith requested a review from a team April 18, 2024 12:53
@cla-bot cla-bot Bot added the cla-signed label Apr 18, 2024
@github-actions github-actions Bot added team/product-platform team/search-platform Issues owned by the search platform team labels Apr 18, 2024
@stefanhengl

Copy link
Copy Markdown
Member

Here I move the repos back and forth between two Zoekt instances by setting either INDEXED_SEARCH_DRAIN_SERVERS to either localhost:3070 or localhost:3071.

image

@stefanhengl stefanhengl merged commit 3313683 into main Apr 18, 2024
@stefanhengl stefanhengl deleted the k/sh/search/drain-zoekt branch April 18, 2024 13:40

@jtibshirani jtibshirani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this turned out quite clean.

sourcegraph-release-bot pushed a commit that referenced this pull request Apr 19, 2024
This PR implements support for draining a zoekt replica via including its hostname in the comma-separated environment variable INDEXED_SEARCH_DRAIN_SERVERS on sourcegraph-frontend.

The way this functionality is implemented is via adjusting the endpoint map we use when making assignment of repos. We still report the hostname as part of the list of endpoints. However, the endpoint is left out of the consistent hash which maps the repositories to endpoints.

Our interactions with zoekt are already designed to do smooth rebalancing when the set of endpoints changes. We have logic to only remove repos from a replica once its new endpoint has it, and we support deduplication of search results across endpoints.

Co-authored-by: Stefan Hengl <stefan@sourcegraph.com>
(cherry picked from commit 3313683)
keegancsmith added a commit that referenced this pull request Apr 19, 2024
search: support draining Zoekt instances (#62005)

This PR implements support for draining a zoekt replica via including its hostname in the comma-separated environment variable INDEXED_SEARCH_DRAIN_SERVERS on sourcegraph-frontend.

The way this functionality is implemented is via adjusting the endpoint map we use when making assignment of repos. We still report the hostname as part of the list of endpoints. However, the endpoint is left out of the consistent hash which maps the repositories to endpoints.

Our interactions with zoekt are already designed to do smooth rebalancing when the set of endpoints changes. We have logic to only remove repos from a replica once its new endpoint has it, and we support deduplication of search results across endpoints.

Co-authored-by: Stefan Hengl <stefan@sourcegraph.com>
(cherry picked from commit 3313683)

Co-authored-by: Keegan Carruthers-Smith <keegan.csmith@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants