Skip to content

Implementation of pure SSH API#350

Closed
sinbad wants to merge 58 commits intogit-lfs:masterfrom
sinbad:feature/full_ssh_multitransfer
Closed

Implementation of pure SSH API#350
sinbad wants to merge 58 commits intogit-lfs:masterfrom
sinbad:feature/full_ssh_multitransfer

Conversation

@sinbad
Copy link
Contributor

@sinbad sinbad commented May 29, 2015

This PR makes git-lfs compatible with a pure SSH server implementation, rather than only authenticating then doing everything else over HTTP. This is particularly useful for people self-hosting who don't want the overhead of setting up & maintaining a web server.

Corresponding reference server implementation is at https://github.com/sinbad/git-lfs-ssh-serve

PR in its current state focusses on a pure SSH route (including upload/download), future extensions might include supporting a further hybrid setup where again no web server is required for any API calls (UploadCheck/DownloadCheck) but the Upload/Download can still be redirected to hypermedia links (most likely S3).

PR also adds support for a few more SSH URL forms including custom ports. The existing git-lfs-authenticate route is not affected.

sinbad added 30 commits May 19, 2015 14:59
For now, HTTP code hasn't been moved more than necessary, left in existing location to aid merging
… full ssh:// URLs

Also corrected duplicated test which was intended to test non-bare SSH URLs I assume
lfs/ssh_test.go Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to "github.com/github/git-lfs/vendor/_nuts/github.com/technoweenie/assert" to get the test to pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, didn't spot that when I merged in the nut changes, cheers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface isn't really ready to be extracted. We don't intend to maintain support for the current upload/download API once the batch API is finalized. I'm thinking an eventual interface would look something like this.

type ApiContext interface {
  Endpoint() Endpoint
  Close() error
  DownloadAll(objects []*ObjectResource)
  UploadAll(objects []*ObjectResource)
}

I think an ApiContext should be responsible for a single Git LFS operation. That includes a smudge filter that downloads a single object, a push command that's uploading 100 objects from the range of commits. I think that would remove the need to access cached ApiContext objects through GetApiContext(). I can't think of a reason you'd need to get an ApiContext for multiple endpoints, or multiple contexts for the same endpoint in a single Git LFS command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I was just working with what was there, happy for it to be 100% batch oriented too. In theory yeah that would eliminate the need for cached contexts within the usage model of the command line, although it does no harm for them to be there really; the SSH transport will always need a discrete setup & tear-down step anyway, so whether it's potentially around multiple ops or just one is neither here nor there in terms of added complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I was developing this in parallel with the interface morphing (Upload disappeared & Check/Object/Batch variants appeared) and it was really no big deal to stay in step. I don't think you necessarily need to wait until your interface is 'final' before allowing it to be abstracted like this - although depends on how many 3rd party implementations you were expecting (I wasn't really expecting any).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'd love to reject all other implementations and figure out some way to implement custom ones through commands. Until then, I think we can merge this and move forward with the API design incrementally. We're likely not accepting other implementations any time soon. Each client makes it that much harder to evolve and maintain.

I'd really like to get rid of GetApiContext if we don't need it. That can wait until everything is using the *All() batch methods though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the concept that the context can survive over multiple API calls is necessary until you can guarantee that only one API call will be made per command, unless you want a separate SSH connection to be fired up for every file. Even after the *All() batch methods are implemented though, I think it's nice to have that flexibility - to replace it would just be equivalent to a global context per API and I'm not sure that actually simplifies much; instead baking in an assumption.

@technoweenie
Copy link
Contributor

Before merging, I'd like to get some assurance that you'll be around to support it. We're not planning on using it in production anywhere. I don't see this being a problem, but I just want to be clear :) I also still need to set this up locally and see the flow working, and make sure our existing servers all still work.

After merging, here are some things I'd like to see:

  • Git config to immediately fallback to git-lfs-authenticate.
  • Implement the *All() batch methods, and get rid of ApiContext.
  • Integration tests would be awesome. I wonder if we can spin up a fake ssh server...
  • Eventually merge the SSH commands. git-lfs-ssh-authenticate (command name doesn't matter) could verify the auth, and then send a response tells the client to either send further commands over the connection, or use the given token to talk to the API.

Regarding a git config fallback: If I understand this, it falls back to the hybrid https/ssh authentication method if connect() discards the error and returns a nil context. Any thoughts on writing a key like remote.{name}.lfs**** to the repo's git config when this happens, so that it doesn't have to do all this every time a user accesses their remote? We're experimenting with this idea in #358.

@sinbad
Copy link
Contributor Author

sinbad commented Jun 5, 2015

Yeah I'd love to figure out how to fully test over real SSH instead of having to use a pipe, that last little piece of non-automated testing bothers me. I couldn't figure out a way to fake it without still having a gap - a fake ssh server in go is possible but it still wouldn't be 100% real. But then again, still better so maybe I should have a crack at it. Before you merge I'd like to do a few more tests myself anyway since I had to finish it while travelling and didn't have my office servers available to test with. Also I don't have a live implementation git-lfs-ssh-authenticate to test with, assume GH's implementation is not available as source - I've signed up for the GitHub LFS early access queue which I guess is the best way to test this, maybe you can get that enabled for me?

Config fallback to remember the preferred route sounds sensible, the only issue is dealing with transitory error conditions, e.g. someone tries to push while offline etc - you wouldn't want to bake in that failure as a preference. I guess that means only setting the option after a successful use of that path.

About maintenance, I'm assigned to this area by Atlassian for at least the next few months and have a personal interest in it anyway so would probably be tinkering with it even if I wasn't. I can't make any irrefutable promises for the future though, real life etc :) I think it's useful even if marked 'experimental' with GH's line being simply that you don't support it - the refactoring is useful anyway. Or, if you're worried it will slow you down when you're adapting the API, at least now I know you're OK with the principles you could merge just the stuff that's immediately useful (SSH URLs, plink/Tortoise support) and I can maintain my SSH fork in parallel & see what the community thinks. The main issue with that is that the refactoring required moving some code around which becomes harder to maintain over time. That said, it was only about 10 days work so if I had to do it manually again I could do it more quickly anyway - it might just not be always in sync. I'm happy with whichever approach you want to take.

@sinbad
Copy link
Contributor Author

sinbad commented Jun 5, 2015

Found a couple of auth prompting issues now I can test in real environments, hold off merging either way until I've nailed them down.
[Edit]False alarm actually, was caused a bad default remote that had nothing to do with the SSH setup.

@sinbad
Copy link
Contributor Author

sinbad commented Jun 5, 2015

I am seemingly able to stall it sometimes with large numbers of files when using git lfs fetch though (and not git checkout - maybe an issue with the batch implementation). Working on that.

@sinbad
Copy link
Contributor Author

sinbad commented Jun 9, 2015

OK, finally figured out why - my misunderstanding of how your goroutines worked. Needs a bit of a design change to accommodate & serialise to the SSH pipe(s), I'll resubmit once I've sorted it out.

@sinbad sinbad closed this Jun 9, 2015
@gully gully mentioned this pull request Feb 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants