Automatic workspace discovery#442
Conversation
34edd53 to
987d310
Compare
241d657 to
a03c060
Compare
|
Heads up reviewers: just noticed that I forgot to add a validation that produces an error and human readable message when multiple changesets with the same branch have been produced in the same repository. I'll add that, but that shouldn't stop anyone from reviewing this. |
| defer rz.mu.Unlock() | ||
|
|
||
| rz.references -= 1 | ||
| if rz.references == 0 && rz.fetcher.deleteZips { |
There was a problem hiding this comment.
Wouldn't this mean when using 1 worker thread, there would be no caching at all? and can there be the case where the workers are utilized like the following
[repo A:/path1]
[repo B]
[repo C]
[repo D]
=>
[repo A:/path2]
[repo B]
[repo C]
[repo D]
where the worker would close the zip for repo A at path1, and then need to refetch for path2?
There was a problem hiding this comment.
Nice catch. You're right. I'll address in follow-up PR.
40c54da to
358684c
Compare
|
There are two things that I need to do:
I'll address those in follow-up PRs to make it easier for review. |
This is a follow-up to #442 and ensures that changeset specs are not getting silently lost by validating that multiple changeset specs in the same repository have different branches. I decided to make this a separate step _after_ the execution of the steps so that users can leverage the cache. That allows them to change the campaign spec and then rerun the command after they get this error, vs. the execution being aborted after running into this error (if we'd do the check inside executor).
This is a follow-up to fix the issue discovered by @eseliger here: #442 (comment) Short version: the previous implementation would only avoid deleting an archive if there were *currently active tasks* holding references to it. If tasks that need the same archive would execute sequentially, though, the archive would be downloaded, deleted, downloaded again. This here is a fix for the issue by first marking all repository archives for later use and only once all marks have been turned into references and those references have been closed is the archive deleted.
* Check for branch duplicates after creating changeset specs This is a follow-up to #442 and ensures that changeset specs are not getting silently lost by validating that multiple changeset specs in the same repository have different branches. I decided to make this a separate step _after_ the execution of the steps so that users can leverage the cache. That allows them to change the campaign spec and then rerun the command after they get this error, vs. the execution being aborted after running into this error (if we'd do the check inside executor). * Fix naming in duplicateBranchesErr
* Implement dynamic workspace discovery * update schema and fix template helpers * Add changelog entry * Rename file * Change naming * Use strings.ReplaceAll
* Check for branch duplicates after creating changeset specs This is a follow-up to #442 and ensures that changeset specs are not getting silently lost by validating that multiple changeset specs in the same repository have different branches. I decided to make this a separate step _after_ the execution of the steps so that users can leverage the cache. That allows them to change the campaign spec and then rerun the command after they get this error, vs. the execution being aborted after running into this error (if we'd do the check inside executor). * Fix naming in duplicateBranchesErr
What?
This adds automatic workspace discovery to src-cli. It allows users to
stepsin those project folders, turning them into workspaces.How?
Users define workspaces like so:
That means: in every repository that starts with
github.com/sourcegraph/sourcprojects have ago.modat its root and those folders should be used asworkspacesfor the execution of campaign specsteps.src-cli uses Sourcegraph search under the hood to search for the locations of the
rootAtLocationOffile, which means it doesn't need to download the repository first and search the file system.workspacescan also contain multiple definitions, matching different repositories (but a repository cannot be matched by multiple definitions):Since multiple workspaces per repository means that multiple changesets will be produced in a single repository, the
changesetTemplate.branchneeds to use templating to avoid name clashes.For that, users can access the template variable
steps.pathand use helper functions to generate a unique branch name per changeset. Example:(The
join_ifand thereplacehelpers are new.join_ifjoins the given list of strings, but ignoring the blank strings.replaceis an alias forstrings.ReplaceAll)Users can, of course, also user other ways to generate a unique branch name per changeset. With
outputs, for example:Or, in combination:
Details & Edge Cases
onand not matched by anworkspaces.in:glob, thestepswill be executed in its root folder.onand matches aworkspace.in:glob, but there are no workspaces in it that contain the file inrootAtLocationOfthen thestepswon't be executed in the repository.Dependency
This requires the addition of
workspacesto the campaign spec schema, which means it requires changes to the Sourcegraph server.The PR is here: https://github.com/sourcegraph/sourcegraph/pull/17757
What's not included
src-cli still downloads a complete archive of every matched repository, even if the steps should only be included in subdirectories. Only downloading archives of the workspace directories is something we should implement in the near future to make support for large monorepos better.
Full campaign spec to try this at home
There you go: