Pure SSH API implementation by sinbad · Pull Request #378 · git-lfs/git-lfs

sinbad · 2015-06-10T13:50:45Z

Implements a pure SSH path allowing you to perform LFS operations entirely over SSH instead of only using SSH for auth and uploading/downloading over HTTP. Useful for those who want to simply self-host using existing SSH key access without using a web server. SSH key authentication is required.

Corresponding reference server implementation is at https://github.com/sinbad/git-lfs-ssh-serve (there's also a mock SSH client in test/cmd/lfstest-mockssh used for integration testing)

Other features of this PR:

Support for ssh:// format URLs as well as bare SSH urls
Support for custom SSH ports
Support for GIT_SSH environment variable, and special port options for plink.exe and TortoisePlink.exe
As many parallel SSH connections as defined in lfs.concurrenttransfers are opened & used at once (default 3)

Future enhancements:

Hybrid mode in git-lfs-ssh-serve where the API is implemented entirely via SSH but upload/download can be via hypermedia links (e.g. store on S3)
Remembering whether SSH is using the pure SSH route or the mixed git-lfs-authenticate route so that we don't waste time on failed calls & falling back each time
Eventually merge features of git-lfs-ssh-authenticate (needs source to reference implementation)
Manual authentication (only SSH key auth is supported right now)

Previously submitted as #350

For now, HTTP code hasn't been moved more than necessary, left in existing location to aid merging

… full ssh:// URLs Also corrected duplicated test which was intended to test non-bare SSH URLs I assume

…stom ports

…upport GIT_SSH, Plink & TortoisePlink

…th interleaved byte streams)

… can't be)

…P reference server

…, since they're general

…ssh server

sinbad · 2015-06-16T16:25:33Z

Argh, just spent many hours tracking down the stalling issue on Travis, turned out it was nothing to do with ssh - fixed it & was going to submit a PR but @michael-k already got to it: #396 Well played, wish I'd worked on this a day later ;)

So now it's all working again & re-integrated with latest changes, will split it up.

* Full SSH urls e.g. ssh://user@host/repo * Custom ports in both bare and full urls * GIT_SSH environment for alternate ssh clients * Explicit support for plink.exe and TortoisePlink.exe

sinbad · 2015-06-17T10:53:18Z

I've split out the SSH URL/port/GIT_SSH support as #404 & re-merged this one on top. Also split out a couple of other small ones in case you want to do those separately but #404 is the main one.

If you don't want to merge this until after the API changes then I won't do any more work on this PR for the moment, since merging it does take a bit of work each time (regular conflicts and I have to be careful to preserve modifications to moved code). I think there's a fairly significant discussion to be had around exactly DownloadObjects will work, I think you still need the Batch() up-front to estimate transfer times. FYI I'm splitting my time between Git LFS and something else now so I'll be back on this early next week barring small things.

AndrewJDR · 2015-11-25T14:01:19Z

I just wanted to throw a +1 here to show my support. I think this would be a great feature to have.

pedrocarmona · 2016-05-26T08:56:23Z

👍

ttaylorr · 2016-05-27T00:00:10Z

Hey @AndrewJDR, just came across this PR again in my inbox. We just shipped #1236 which should be a big step in making it possible to easily implement this with minimal effort. 😄

AndrewJDR · 2016-08-27T05:30:54Z

I noticed this was closed. Does that mean there are no plans for a pure SSH implementation that ships with the official git-lfs package?

ttaylorr · 2016-08-29T14:49:10Z

I think it is something that we're interested in long-term, but not right now.

AndrewJDR · 2016-09-07T10:01:53Z

Well for what it's worth, I'll be very happy when this gets implemented some day. If I had more free time, I'd do it myself.
It seems like @sinbad had pure SSH almost totally implemented at one point! I'm assuming that's no good due to all the api changes?

sinbad · 2016-09-07T10:49:57Z

Yeah, it's way out of date now.

I'd like a pure SSH route one day too just on principle, I think if you've chosen to use SSH you should be able to complete everything that way if you want. But in practice most providers end up with the lfs storage accessible primarily over HTTPS anyway, and you get some nice automatic features that way, so in practice it turned out to be a bit niche. I'm an old traditionalist though (shakes fist at cloud) so I'd like it to happen eventually.

AndrewJDR · 2016-09-07T11:09:32Z

@sinbad If pure SSH does happen one day, please revive or at least throw a post into this thread so I get notified :) Thanks!

PS. I am with you!

johnwbyrd · 2016-11-16T09:36:22Z

Yeah, I'm glad I saw this before I implemented LFS for our project. The promise of "git lfs" is that "everything will just work." If you have no interest in supporting ssh properly, then we need the special GitHub type lfs http server with the special GitHub type protocol which means that you can't just run a server on any old Linux box with ssh, which means that LFS isn't really a standard at all.

Will check back in a couple years once you guys figure this one out.

AndrewJDR · 2016-11-16T09:50:20Z

@johnwbyrd This is a fair criticism of git-lfs imo. If you're looking for something that 'just works' over ssh, I can suggest looking at git-fat: https://github.com/ciena-blueplanet/git-fat
It's very similar to git-lfs, but already works over plain old ssh.

From github's perspective, I think they made git-lfs to serve their own purposes as a company and opening it up was just gravy. I suppose I understand that there is maybe little incentive for them to implement this. That said... I really, really wish someone would implement this.

technoweenie · 2016-11-16T15:23:00Z

Thanks for the feedback. Lack of an SSH protocol is definitely fair criticism about LFS. There are a lot of other core issues with LFS, and our small team is working on what we think is the most crucial. However, I'd love to work with someone on implementing a pure SSH protocol.

johnwbyrd · 2016-11-16T20:26:27Z

The more I think about this problem, the more I think that the fundamental approach taken with LFS, annex, fat, lfs, etc.... are unsound. They all take the basic approach of creating a parallel metadata repository, and filtering out certain files so they are not stored in git, but stored in the metadata repository. This then creates a parallel problem of managing the metadata repository. So now you've taken one problem and made two problems out of it.

Linus got it right the first time. Git's secret weapon is its distributed nature -- any git install is a client and a server and both. But everyone is trying to solve this problem in a client/server manner. I believe there is a better way... but it's not going to be got by messing around with the porcelain of Git, or by building servers to manage big data in parallel.

Recall that if two git users have different local branches, their repositories can look quite different. One user can have a branch called "master" and another branch called "bigfiles". Another user can just have a branch called "master". The size of their repositories will be quite different, even though they are both working on the same master branch.

I posit that the central problem is not managing "big" files and rerouting them into a central server with a filter. The central problem is deciding under which conditions certain refs and/or branches should be pushed or pulled between repositories, and then deciding how those refs can be combined and/or overlaid and presented to the user at checkout time. It is possible that this might be accomplished merely with another level of indirection. Like git lfs, this proposed method would have blobs containing SHA1 hashes which would be "fixed up" with a smudge filter... but unlike lfs, those smudge filters would look for blobs or trees in the current repository that match the SHA1 hash given. If they exist, you have a "full" repository; if not, then you don't currently have access to the "big" files and still need to get them from someone else.

Look at what such a system would buy you. You could decide whether you wanted to be up-to-date constantly with a detailed history of every "big" asset, you could say you only wanted to track the latest "big" asset in the repository (conveniently, git gc gets rid of dangling blobs for you), or you could say you don't want to see big assets at all. And based on the branches and/or refs you receive at push/pull time, git itself could work out whether the "big" files need to go over the wire or not, and use its existing systems to transfer them if they need to go. No special servers or protocols needed.

The question is not whether big files should be stored in git or not. They should. The question is who should have merely the SHA1s of those big files in their repository, and who should have the big files themselves in their repository.

This is probably worthy of a blog post or longer discussion, And I'd be very surprised if I were the first person who thought of this approach. I'll get out of this pr now (I know this discussion does not belong here) and come back when I have something useful. Thanks for listening.

Schroedingers-Cat · 2016-11-18T14:34:23Z

Interesting point! I think being able to set up a file like .gitignore where you put in lines like ".png" that will be cloned/pulled to your repository only when you checkout a commit sounds great and would feel more git-like. You could have all these files garbage collected when you didn't check them out for a week or so. It would also be great if you could specifiy certain file sizes and combine then with file types for automatic purging from the repo.

As these actions are potentially dangerous (you are telling your vcs to drop content silently you put in it in the first place) it should show an info like "incomplete repository" when calling up "git status".

When you are interested in a native git-like solution, you can achieve kind-of-the-same by putting large files into submodules and shallow-cloning these submodules. This way, you keep the entire source code history but have only a subset of the big files shallow-cloned to you local submodul.

ioquatix · 2017-10-05T21:57:17Z

@johnwbyrd did you ever follow up with an implementation or more discussion somewhere?

johnwbyrd · 2017-10-09T02:18:59Z

Nope, Real Life has gotten in the way. In my copious spare time. Real Soon Now.

In the meantime, note that "git overlay" as a concept means several different things, so I'll need to go with "git layer" or some similar term.

Also note that the new "git worktree" concept is about two doors down from the thing I'm proposing.

ioquatix · 2017-10-09T02:24:22Z

@johnwbyrd It's okay I understand that time is precious.

git worktree is pretty interesting. How does it relate to what you are proposing?

johnwbyrd · 2017-10-09T03:12:48Z

git worktree is a new git feature that demonstrates, at least conceptually, that two branches of the same git repository can coexist at the same moment. git worktree does nothing about reconciling these differences into a single directory, nor was it intended to do that.

sinbad added 30 commits May 19, 2015 14:59

HTTP localised & added context so proper SSH support can be added

97fc097

For now, HTTP code hasn't been moved more than necessary, left in existing location to aid merging

Remove defunct notes

265e41e

Move HTTP API code from api.go to http.go

dbea408

Starting implementation of ssh context, connection & close only

a03b433

Enhance SSH URL parsing to handle custom ports, implicit username and…

35f55bc

… full ssh:// URLs Also corrected duplicated test which was intended to test non-bare SSH URLs I assume

Fix fallback https URL for ssh URL with custom port, add tests for cu…

5d60c26

…stom ports

Remove outdated TODO

89c60fe

Support custom ports in ssh and upgrade sshAuthenticate fallback to s…

0ca3060

…upport GIT_SSH, Plink & TortoisePlink

Support GIT_SSH [Tortoise]Plink settings with file extensions

0d92958

TortoisePlink uses -P not -p for port

c92f322

Added tests for sshGetExeAndArgs

e5a9c3f

Indicate TODOs in new api interface

03d654a

Document what the arguments represent in CopyCallback

30a7c92

Adding required utility functions for JSON-RPC style ssh protocol (wi…

a803fd6

…th interleaved byte streams)

Tests for SSH son encode/decode

b85e8bc

Add docs for proposed pure SSH API path

ebd27b9

Wrap markdown at 80 chars for easier readability (except tables which…

1be411e

… can't be)

Fix list after wrap

0c42e80

More wrapping

6077746

Wrapping fixes

680745e

Rename git-lfs-serve to git-lfs-ssh-serve to avoid confusion with HTT…

046e022

…P reference server

Implement Upload() for SSH

8ea4e47

Make sure we always shut down multi-request contexts

4512059

Add clean exit request for server

0410e1b

Add test for SSH download (via Pipe)

0b24874

Test SSH Upload

5593904

Make ExtractStructFromJsonRawMessage unscoped (general fund)

641ab8a

Take JSON messaging utility functions out of scope of SSH API context…

98cae76

…, since they're general

Started reference implementation of git-lfs-ssh-serve

a5c087c

Implemented download and downloadInfo in reference implementation of …

03fca55

…ssh server

Send failure to use full SSH path to tracerx

c286b14

michael-k mentioned this pull request Jun 15, 2015

Cleaned up arguments #392

Merged

Merge branch 'master' into feature/ssh

98e7138

sinbad added 6 commits June 17, 2015 09:53

Merge latest from master

6cbfed2

Improved SSH support: non-bare urls, custom ports, and GIT_SSH/plinks

19c0b22

* Full SSH urls e.g. ssh://user@host/repo * Custom ports in both bare and full urls * GIT_SSH environment for alternate ssh clients * Explicit support for plink.exe and TortoisePlink.exe

Merge branch 'ssh_urls' into sjs_integration

50be3bf

Merge branch 'callback_doc' into sjs_integration

8e399fc

Full SSH support rebased on top of other split PRs

a6aa784

Merge branch 'sjs_integration' into feature/ssh

e53dd99

sinbad mentioned this pull request Oct 27, 2015

Refactor API client #779

Closed

technoweenie closed this Aug 17, 2016

Conversation

sinbad commented Jun 10, 2015

Uh oh!

sinbad commented Jun 16, 2015

Uh oh!

sinbad commented Jun 17, 2015

Uh oh!

AndrewJDR commented Nov 25, 2015

Uh oh!

pedrocarmona commented May 26, 2016

Uh oh!

ttaylorr commented May 27, 2016

Uh oh!

AndrewJDR commented Aug 27, 2016

Uh oh!

ttaylorr commented Aug 29, 2016

Uh oh!

AndrewJDR commented Sep 7, 2016

Uh oh!

sinbad commented Sep 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndrewJDR commented Sep 7, 2016

Uh oh!

johnwbyrd commented Nov 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndrewJDR commented Nov 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

technoweenie commented Nov 16, 2016

Uh oh!

johnwbyrd commented Nov 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Schroedingers-Cat commented Nov 18, 2016

Uh oh!

ioquatix commented Oct 5, 2017

Uh oh!

johnwbyrd commented Oct 9, 2017

Uh oh!

ioquatix commented Oct 9, 2017

Uh oh!

johnwbyrd commented Oct 9, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

sinbad commented Sep 7, 2016 •

edited

Loading

johnwbyrd commented Nov 16, 2016 •

edited

Loading

AndrewJDR commented Nov 16, 2016 •

edited

Loading

johnwbyrd commented Nov 16, 2016 •

edited

Loading