Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

batches: fix naming of forks created by batch changes (bitbucket server)#43681

Merged
adeola-ak merged 20 commits into
mainfrom
aa/fix-fork-name
Nov 11, 2022
Merged

batches: fix naming of forks created by batch changes (bitbucket server)#43681
adeola-ak merged 20 commits into
mainfrom
aa/fix-fork-name

Conversation

@adeola-ak

@adeola-ak adeola-ak commented Oct 31, 2022

Copy link
Copy Markdown
Contributor

closes: #41641

When batchChanges.enforceForks are enabled for batch changes we currently do not handle the case of repos that are not forks yet have the same name. Currently the changeset would fail to be created and the following error would occur:

Getting user fork for "{batch_changes_user}": repo was not forked from the given parent

This was fixed by changing the naming convention of the forks to be more unique, updating the name to parent.Project.Key + "-" + parent.Slug

I will file a follow up issue to see if the error that is returned from the "Not a Fork" test could be a little more descriptive in telling the user you are not permitted to create a fork of a repo that is user namespaced. That action currently fails with the following error:

Failed to run operations on changeset Getting user fork: creating user fork for "milton": Bitbucket API HTTP error: code=409 url="https://bitbucket.sgdev.org/rest/api/1.0/projects/~MILTON/repos/vcr-fork-test-repo" body="{"errors":[{"context":"name","message":"This repository URL is already taken.","exceptionName":null}]}"

Test plan

tested implementation by creating forks for same name repos, that are no longer failing at changeset creation and no longer mistakenly being detected as forks

@adeola-ak adeola-ak changed the title batches: batches: fix naming of forks created by batch changes Oct 31, 2022
@cla-bot cla-bot Bot added the cla-signed label Nov 3, 2022
@adeola-ak adeola-ak marked this pull request as ready for review November 7, 2022 23:13
@adeola-ak adeola-ak requested a review from a team November 7, 2022 23:14
@sourcegraph-bot

sourcegraph-bot commented Nov 7, 2022

Copy link
Copy Markdown
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff bde3ece...be7329a.

Notify File(s)
@eseliger enterprise/internal/batches/sources/bitbucketserver.go
enterprise/internal/batches/sources/bitbucketserver_test.go
enterprise/internal/batches/sources/testdata/golden/TestBitbucketServerSource_GetUserFork_already_forked
enterprise/internal/batches/sources/testdata/golden/TestBitbucketServerSource_GetUserFork_new_fork
enterprise/internal/batches/sources/testdata/sources/TestBitbucketServerSource_GetUserFork_already_forked.yaml
enterprise/internal/batches/sources/testdata/sources/TestBitbucketServerSource_GetUserFork_bad_username.yaml
enterprise/internal/batches/sources/testdata/sources/TestBitbucketServerSource_GetUserFork_new_fork.yaml
enterprise/internal/batches/sources/testdata/sources/TestBitbucketServerSource_GetUserFork_not_a_fork.yaml
enterprise/internal/batches/sources/testdata/sources/TestBitbucketServerSource_GetUserFork_not_forked_from_parent.yaml

@courier-new

Copy link
Copy Markdown
Contributor

Thanks for this!! Did you confirm that this issue isn't a problem for BitBucket Cloud, GitHub, or GitLab? My guess is that they'll probably all have the same or very similarly-behaving issues.

@courier-new

Copy link
Copy Markdown
Contributor

Adeola and I tested the other code hosts together in our 1:1, here are our findings:

  • GitHub: It forks them successfully under unique names, with an increment (fork, fork-1).
  • GitLab: It does not fork them successfully.
  • BitBucket Cloud: It does not fork them successfully.

Even though GitHub handles the case okay, we agreed that our <org>-<repo> naming scheme is more informative than the base incrementing pattern that GitHub does by default. For consistency, we'll apply the same pattern across all code hosts when forking.

@adeola-ak

Copy link
Copy Markdown
Contributor Author

I will fix the remaining codehosts in a separate ticket and link it here

// If an update is run by someone who's not aharvey, this needs to be a
// repo that isn't a fork.
target := newBitbucketServerRepo(urn, "~AHARVEY", "old-talk", 0)
// This is a repo that isn't a fork. Use credentials in 1Password for "milton" to

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these tests going to our actual Bitbucket instance or is the test data being injected (somehow)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Piszmog actual instance, docs on this process coming soon :)

fork, err := bbsSrc.GetUserFork(ctx, target)
assert.Nil(t, fork)
assert.ErrorIs(t, err, errNotAFork)
_, err = bbsSrc.GetUserFork(ctx, target)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would still be good to ensure that fork is nil

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought thats what this line does? assert.Nil(t, err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right, but it looks like from the diff that line got deleted, though. I think Randell's saying we should keep it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i had removed that because 'fork' was undeclared

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! If you could re-add it. I like to ensure we cover our bases with outputs

@eseliger

eseliger commented Nov 8, 2022

Copy link
Copy Markdown
Member

2 questions:

  • Do we really want to step away from GitHubs default behavior? Wouldn't that be what customers would expect to happen for a fork?

  • What protects us from the case when there's a repo named the same still? Previously, it would fail if you fork sourcegraph/src-cli into the user namespace if eseliger/src-cli existed, now it'll fail if eseliger/sourcegraph-src-cli exists. Also, for the standard case "repo doesn't conflict" we now create "unexpected" fork names.

Alternative idea:
What if we said If we want to fork:

  • fetch from the code host if <namespace>/<reponame> already exists
  • if not exists: Create
  • if exists: Check if the found repo is already pointing to the correct parent repo
  • if the parent: Go head and use that fork
  • if not the parent: Create a new fork using the naming scheme proposed in this PR

Comment thread enterprise/internal/batches/sources/bitbucketserver.go Outdated
@courier-new

courier-new commented Nov 9, 2022

Copy link
Copy Markdown
Contributor

(Just jotting down a bunch of passing thoughts, I defer to Adeola's judgment on what to do about this!)

Do we really want to step away from GitHubs default behavior? Wouldn't that be what customers would expect to happen for a fork?

I'd actually push back on that. I think it's a stretch to say most customers would expect something specific to happen when forking two repos with the same name, let alone expect what GitHub is doing, and currently it's non-deterministic which one gets forked first and gets the name user/fork vs. which one gets user/fork-1, which doesn't feel great either. I think if I were dealing with changesets across multiple code hosts, I'd rather see consistently-named forks (that are documented as diverging from GitHub's default behavior), so that I could easily tell which of my fork repos were generated from Sourcegraph.

What protects us from the case when there's a repo named the same still? Previously, it would fail if you fork sourcegraph/src-cli into the user namespace if eseliger/src-cli existed, now it'll fail if eseliger/sourcegraph-src-cli exists.

I agree this isn't ideal, but as you point out, this isn't a new problem, it's now just the same problem under a slightly different circumstance. It feels like that problem is adjacent to the one this PR is for, though. Not to say it wouldn't be great to get a 2-for-1, of course. 🙂

Also, for the standard case "repo doesn't conflict" we now create "unexpected" fork names.

"Unexpected" meaning just that generally it's expected that fork name == original name? I actually also don't think this would be a big problem for me so long as we document that we always name forks that way. It sure helps disambiguate things for generic, one-word repo names when the BitBucket UI doesn't tell you anything on the main page about what the repo was forked from (sigh). Perhaps we could even give more of a hint about this on the preview page when we show that little fork indicator on the changeset branch badge?

Alternative idea:
What if we said If we want to fork:

  • fetch from the code host if <namespace>/<reponame> already exists
  • if not exists: Create
  • if exists: Check if the found repo is already pointing to the correct parent repo
  • if the parent: Go head and use that fork
  • if not the parent: Create a new fork using the naming scheme proposed in this PR

Wouldn't this also be confusing if only sometimes the fork repo shares the name with the original repo, and sometimes it doesn't? Especially if, again, the only determining factor for which is which is whoever got there first?

I also think it'd be hard to manage race conditions here, since all the forking stuff would be happening for multiple changesets in parallel. What if we're publishing changesets for both sourcegraph/src-cli and sourcegraph-testing/src-cli at the same time, but both checks find that courier-new/src-cli doesn't already exist? With this logic, they'd both try to create that fork themselves, so we'd be stuck back where we started where one fails on the naming collision. I suppose the retry mechanism would hopefully catch it the second time around and direct it to use the <project>-<repo> name pattern that time, but it feels messy if we let that happen. 😅

@adeola-ak

adeola-ak commented Nov 9, 2022

Copy link
Copy Markdown
Contributor Author

thanks @eseliger and @courier-new for your thoughts here.. really helpful. But yeah, I think I prefer to not use numbers in the names, as I do agree that as a user, I would have any expectation of what to expect in this case of forking repos with the same name, and I dont think that I would be shocked to see this name convention eseliger/sourcegraph-src-cli for example. I also remember us talking about possibly appending a number to the name in the past and I think we agreed that we didnt love that, so since I'd have no expecation as a user, I dont believe we need to do the number now just because github does it that way

fork, err := bbsSrc.GetUserFork(ctx, target)
assert.Nil(t, fork)
assert.ErrorIs(t, err, errNotAFork)
_, err = bbsSrc.GetUserFork(ctx, target)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! If you could re-add it. I like to ensure we cover our bases with outputs

@adeola-ak

Copy link
Copy Markdown
Contributor Author

Unfortunately when trying to update the test to ensure the fork is nil by adding:

fork, err := bbsSrc.GetUserFork(ctx, target)
assert.Nil(t, fork)

I'm now getting the following error: Error "getting username: Bitbucket API HTTP error: code=401 url=\"https://bitbucket.sgdev.org/rest/api/1.0/users?limit=1\" body=\"{\\\"errors\\\":[{\\\"context\\\":null,\\\"message\\\":\\\"Authentication failed.

I dont know how I've suddenly become unauthenticated, and I've taken a number of troubleshooting steps but im not longer able to update these tests.. when i run
curl -H "Authorization: Bearer $BITBUCKET_SERVER_TOKEN" https://bitbucket.sgdev.org/rest/api/1.0/users

I get data back, indicating this is not a token error.. I've spoken to @LawnGnome briefly about this and he isnt sure what can be causing this... @courier-new since I think you were previously authorized to do updates, do you mind helping me add those lines if you dont get an authorization error as well?

@courier-new

Copy link
Copy Markdown
Contributor

I'm happy to try re-recording the test for you, but I think that line can be safely added/modified/removed without affecting the golden test data, since we're not actually changing anything about the request to BitBucket! We have to re-record the tests when we change how bbsSrc.GetUserFork() is implemented, since that's where the request to the code host actually happens, but here we're only changing what we do with the values returned from the method.

(If that's enough to unblock you, we can save figuring out the authentication problem for while we're working on the docs!)

@adeola-ak

adeola-ak commented Nov 11, 2022

Copy link
Copy Markdown
Contributor Author

@courier-new yeah agreed, thanks! I do know that it isnt affecting the golden test data, however when I add that line and re run the rest, the test is now failing with that error when it wasnt before..so I think if i push the changes i'll have buildkite issues, or will i not? Im not sure what you mean by re-recording. I think unblocking me here would be adding the line im trying to add so I have the nil fork coverage, and can successfully merge this PR and then figuring out the auth problem as you suggested

@courier-new

Copy link
Copy Markdown
Contributor

Hmm, how are you trying to run the test when you get that failure? If you're running with -update, it'll try to actually talk to BitBucket again and overwrite the request + response it gets to those golden test data files. That's what I mean by re-record. 🙂 If you run it like a normal test without -update, it shouldn't talk to BitBucket at all, and it should only "replay" the interactions with the request + response it has saved. You shouldn't need to authenticate if you're running the tests in "replay" mode like that.

@adeola-ak

Copy link
Copy Markdown
Contributor Author

@courier-new when i remove the update flag im still getting the same error:

sources % go test -run TestBitbucketServerSource_GetUserFork

--- FAIL: TestBitbucketServerSource_GetUserFork (0.00s)
    --- FAIL: TestBitbucketServerSource_GetUserFork/not_a_fork (0.00s)
        bitbucketserver_test.go:773: 
                Error Trace:    /Users/adeola/Documents/sourcegraph/enterprise/internal/batches/sources/bitbucketserver_test.go:773
                Error:          Error "getting username: Bitbucket API HTTP error: code=401 url=\"https://bitbucket.sgdev.org/rest/api/1.0/users?limit=1\" body=\"{\\\"errors\\\":[{\\\"context\\\":null,\\\"message\\\":\\\"Authentication failed. Please check your credentials and try again.

can you add the lines? or do you think ill be fine if I just add the lines and push the code since they dont affect test data? Im a little scared too since its in a good state on buildkite now

@courier-new

courier-new commented Nov 11, 2022

Copy link
Copy Markdown
Contributor

Do you have a clean working tree beyond these 2 line changes? Is it possible you ran it once with -update previously and overwrote the response data file with the authentication error? Check and make sure you don't have any diffs for files in testdata first. If you do, discard them and try without -update one more time. If you don't... then I'll be really confused. 😬

@adeola-ak

adeola-ak commented Nov 11, 2022

Copy link
Copy Markdown
Contributor Author

@courier-new ohh thanks.. that did work, i still dont know why i have the auth error for when i actually do need to update tests, but thanks for pointing out that i can do this process without -update Ill be sure to add to the docs the importance of clearing response data..I think ill merge this now and get back to working with the other code hosts

@courier-new courier-new left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through the tests for this one. 🙂 Once you've created the follow-up issue for the other code hosts, do you mind adding a comment to link them from this PR and/or from the initial ticket, for tracing purposes?

@adeola-ak

Copy link
Copy Markdown
Contributor Author

Fix for Gitlab PR: #44458

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to fork a repo when multiple repos have same name

5 participants