Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

feat(batches): use "keyword" as default pattern type#63613

Merged
stefanhengl merged 14 commits into
mainfrom
sh/kw/batch
Jul 9, 2024
Merged

feat(batches): use "keyword" as default pattern type#63613
stefanhengl merged 14 commits into
mainfrom
sh/kw/batch

Conversation

@stefanhengl

@stefanhengl stefanhengl commented Jul 3, 2024

Copy link
Copy Markdown
Member

This is part of the Keyword GA Project.

Batch Changes uses Sourcegraph queries to define the list of repositories on which the batch change will run.

With this change we default to pattern type "keyword" instead of "standard".

To make this a backward compatible change, we also introduce a version identifier to batch specs. Authors can specify version: 2 in the spec, in which case we default to pattern type "keyword". Existing specs (without a specified version) and specs with version: 1 will keep using pattern type "standard".

Screenshot 2024-07-08 at 14 51 27

Notes:

  • Corresponding doc update PR
  • We don't have a query input field, but instead the query is defined in a batch spec YAML. It didn't feel right to edit the YAML and append "patternType: " on save, which is what we do for Code Monitors and Insights.
  • I misuse the pattern type query parameter to effectively override the version. Once we introduce "V4" we should come back here and clean up. I left a TODO in the code.

Test plan:

  • New and updated unit tests
  • manual testing
    • new batch changes use version: 2 by default.
    • using an unsupported version returns an error
    • I ran various "on:" queries to verify that version 2 uses keyword search and version 1 uses standard search.

Changelog

  • The new (optional) field "version" of batch specs determines how the spec is processed. This allows us to introduce new features while maintaining backward compatability.
  • A new version 2 is introduced. Batch specs specifying version: 2 will use keyword search as the default pattern type to determine repos/workspaces. Batch specs with version: 1 or without version field keep using pattern type "standard".

@cla-bot cla-bot Bot added the cla-signed label Jul 3, 2024
@github-actions github-actions Bot added team/product-platform team/search-platform Issues owned by the search platform team labels Jul 3, 2024
@stefanhengl

Copy link
Copy Markdown
Member Author

@eseliger I am not very familiar with the Batch Changes code, so I am looking for some guidance. Do you think this approach makes sense? Especially these 2 points:

  • We don't have a query input field, but instead the query is defined in a batch spec YAML. It didn't feel right to edit the YAML and append "patternType: " on save, which is what we do for Code Monitors and Insights.
  • I hardcoded "keyword" and didn't let the pattern type depend on the settings. Several users might interact with the spec and it would be confusing if the list of repos depended on the user.

Are there any other places in Batch Changes where users can define Sourcegraph queries?

@stefanhengl stefanhengl marked this pull request as ready for review July 3, 2024 13:32
@stefanhengl stefanhengl requested a review from a team July 3, 2024 13:32

@jtibshirani jtibshirani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me (although I am far from a batch changes expert!) It makes sense to use the same default for every user, as batch changes are shared, similar to what we do for notebooks.

You could argue that existing batch changes should use the previous default patterntype standard 🤔 That way, when a user reruns the same batch change after the 5.5 upgrade, it will pull up the same repositories. Curious how important you think it would be to implement that (which would require storing patterntypes in the DB?) versus just keeping it simple with an approach like this. Clarifying: I am not saying "we definitely need to do this", genuinely asking a question!

@eseliger

eseliger commented Jul 3, 2024

Copy link
Copy Markdown
Member

So this means patternType is always overwritten to "keyword"? Or can you still specify the patternType inline?

I think Julies concern is valid - this would change the behavior of existing batch specs 🤔 I'm not sure how important that property is, but it should probably at least be well called out somewhere.

Can we also make sure the examples in client/web/src/enterprise/batches/batch-spec/edit/library/*.yaml still work?


Are there any other places in Batch Changes where users can define Sourcegraph queries?

I think that'd be it.

@stefanhengl

stefanhengl commented Jul 4, 2024

Copy link
Copy Markdown
Member Author

So this means patternType is always overwritten to "keyword"? Or can you still specify the patternType inline?

You can still specify it inline. We just use "keyword" if the user didn't specify anything.

Can we also make sure the examples in client/web/src/enterprise/batches/batch-spec/edit/library/*.yaml still work?

Oops, I didn't know about those. I will take a look.

EDIT:
I can confirm, the examples still work. Only 2 of the examples had a pattern, one specified patternType:regexp, the other just had a single term.

@@ -1,3 +1,4 @@
version: v2

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is taken as the default when opening a new batch request. By updating the example we nudge users towards using v2.

func (wr *workspaceResolver) FindDirectoriesInRepos(ctx context.Context, fileName string, batchSpecVersion string, repos ...*RepoRevision) (map[repoRevKey][]string, error) {
findForRepoRev := func(repoRev *RepoRevision) ([]string, error) {
query := fmt.Sprintf(`file:(^|/)%s$ repo:^%s$@%s type:path count:all`, regexp.QuoteMeta(fileName), regexp.QuoteMeta(string(repoRev.Repo.Name)), repoRev.Commit)
query := fmt.Sprintf(`file:(^|/)%s$ repo:^%s$@%s type:path count:all patterntype:keyword`, regexp.QuoteMeta(fileName), regexp.QuoteMeta(string(repoRev.Repo.Name)), repoRev.Commit)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying patterntype:keyword is not really necessary here, but we should always be explicit when specifying queries anyway.

Comment on lines +391 to +404
func determineDefaultPatternType(batchSpecVersion string) searchquery.SearchType {
switch batchSpecVersion {
case "v1":
return searchquery.SearchTypeStandard
case "v2":
return searchquery.SearchTypeKeyword
default:
return searchquery.SearchTypeStandard
}
}

func (wr *workspaceResolver) runSearch(ctx context.Context, query string, batchSpecVersion string, onMatches func(matches []streamhttp.EventMatch)) (err error) {
defaultPatternType := determineDefaultPatternType(batchSpecVersion)
req, err := streamhttp.NewRequestWithVersion(wr.frontendInternalURL, query, searchAPIVersion, &defaultPatternType)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core of the change. The default pattern type now depends on the batch spec version.

@stefanhengl

stefanhengl commented Jul 4, 2024

Copy link
Copy Markdown
Member Author

I think Julies concern is valid - this would change the behavior of existing batch specs 🤔 I'm not sure how important that property is, but it should probably at least be well called out somewhere.

I updated the PR and introduced a batch spec version. This way, existing batch specs won't change behavior.

@stefanhengl stefanhengl requested review from a team and jtibshirani July 4, 2024 14:23
stefanhengl referenced this pull request in sourcegraph/docs Jul 4, 2024
This adds documentation for the new batch spec version.

Relates to https://github.com/sourcegraph/sourcegraph/pull/63613
stefanhengl referenced this pull request in sourcegraph/docs Jul 8, 2024
This adds documentation for the new batch spec version.

Relates to https://github.com/sourcegraph/sourcegraph/pull/63613

@eseliger eseliger left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change LGTM, left a few minor comments.

I'm a bit sad that by default, it's easy to miss that you will need to add version: 2 to get our de-facto standard, but it seems like a lot of extra work to make heuristics based on create date here or so - so let's be pragmatic here.

Needs to update src-cli to support this as well.

We should also add this to the docs to make sure most people will eventually use version: 2.

Comment thread lib/batches/batch_spec.go Outdated
Comment thread schema/batch_spec.schema.json Outdated
Comment thread lib/batches/batch_spec.go Outdated
}

// For backward compatibility, we allow the version field to be omitted.
if spec.Version != "" && !slices.Contains(SupportedBatchSpecVersions, spec.Version) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will have to update src-cli with the latest version of sg/sg/lib and probably add some logic to the query runner there, too so v2 works in CLI execution mode as well

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eseliger Is the src-cli change a blocker for this PR? Or can we follow-up with the src-cli change even after the release?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually release those in lockstep, but I don't think it has to be in for 5.5.0, we can cut a 5.5.1 as well. We should just make sure to not let that sit for too long, as batch specs shown in the web will no longer work locally (doesn't understand version and silently ignores it)

Comment thread internal/batches/service/workspace_resolver.go Outdated
stefanhengl referenced this pull request in sourcegraph/docs Jul 8, 2024
This adds documentation for the new batch spec version.

Relates to https://github.com/sourcegraph/sourcegraph/pull/63613
@stefanhengl

Copy link
Copy Markdown
Member Author

We should also add this to the docs to make sure most people will eventually use version: 2.

Here is the PR for the docs.

@jtibshirani jtibshirani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a "schema version" for batch changes seems generally useful and works well for this purpose.

When someone creates a batch change through the UI, could we automatically set version: 2? Otherwise it's really non-obvious to users they should be doing this. And if they don't, there will be the inconsistency we're trying to avoid (defaulting to patterntype: standard).

in Go code with the equivalent but clearer `strconv.Iota` call.

# Find all repositories that contain the `fmt.Sprintf` statement using structural search
# Find all repositories that contain the `fmt.Sprintf` statement using regular expression search

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks for fixing this after my recent PR :)

}
if diff := cmp.Diff(want, have); diff != "" {
t.Fatalf("returned workspaces wrong. (-want +got):\n%s", diff)
for _, version := range []int{0, 1, 2} { // Test all versions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny comment, it's unusual to be using "version: 0" to mean "no version is set!" Maybe we could explicitly detect the "no version" case early and swap it out for a valid version in the set [1, 2]?

@stefanhengl stefanhengl Jul 9, 2024

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The batch spec parser would be the right place it seems. However, I tried it out, and this would store the updated spec in the DB instead of the original, which causes some of the tests to fail. I am not familiar enough with Batch Changes to judge if that would violate a fundamental assumption about batch specs so I reverted the change. I think that would be a nice change though, so I will follow-up once I have time to dig a bit deeper.

Comment thread CHANGELOG.md Outdated
@stefanhengl

Copy link
Copy Markdown
Member Author

When someone creates a batch change through the UI, could we automatically set version: 2? Otherwise it's really non-obvious to users they should be doing this. And if they don't, there will be the inconsistency we're trying to avoid (defaulting to patterntype: standard).

That's already the case. When you create a batch change, it takes the "hello world" example as starting point and offers the other examples as alternative. I modified all of them to have the version on top.

stefanhengl and others added 3 commits July 9, 2024 09:08
Co-authored-by: Julie Tibshirani <julietibs@apache.org>
@jtibshirani

Copy link
Copy Markdown
Contributor

When you create a batch change, it takes the "hello world" example as starting point and offers the other examples as alternative.

Ah, right! You even explained in https://github.com/sourcegraph/sourcegraph/pull/63613#discussion_r1665745471...

stefanhengl referenced this pull request in sourcegraph/src-cli Jul 10, 2024
This bumps sourcegraph/sourcegraph/lib to pull in the latest changes from https://github.com/sourcegraph/sourcegraph/pull/63613 to support the new `version` field for batch specs.

Notes:
 - To resolve repos, batches calls the GraphQL endpoint `resolveWorkspacesForBatchSpec`, which takes the serialized spec. This means all the logic to resolve repos is outsourced to Sourcegraph. This is great news for this PR because we don't have to worry how `on.RepositoriesMatchingQuery` is interpreted. The code path in which src-cli itself resolves the workspaces was removed in #819.
 
### Test plan
- New unit tests
- Manual testing. I tested the following workflow
  - `src batch new -f test.spec.yaml` (-> spec contains `version: 2`)
  - `src batch apply -f test.spec.yaml`   (-> validation passes,  batch spec shows up in Sourcegraph db)
  - `src batch validate -f test.spec.yaml` (-> validation passes or fails if I change the version to an unsupported value)
  - `src batch repos -f test.spec.yaml` (-> checked that returned repos match with expected pattern type) 

Closes SPLF-126
stefanhengl referenced this pull request in sourcegraph/docs Jul 10, 2024
This adds documentation for the new batch spec version.

Relates to https://github.com/sourcegraph/sourcegraph/pull/63613

Closes SPLF-130
@stefanhengl stefanhengl changed the title batches: use "keyword" as default pattern type feat(batches): use "keyword" as default pattern type Jul 10, 2024
@stefanhengl stefanhengl changed the title feat(batches): use "keyword" as default pattern type feat(batch changes): use "keyword" as default pattern type Jul 10, 2024
@stefanhengl stefanhengl changed the title feat(batch changes): use "keyword" as default pattern type feat(batch_changes): use "keyword" as default pattern type Jul 12, 2024
@stefanhengl stefanhengl changed the title feat(batch_changes): use "keyword" as default pattern type feat(batches): use "keyword" as default pattern type Jul 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/product-platform team/search-platform Issues owned by the search platform team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants