Conversation
For now, HTTP code hasn't been moved more than necessary, left in existing location to aid merging
… full ssh:// URLs Also corrected duplicated test which was intended to test non-bare SSH URLs I assume
…upport GIT_SSH, Plink & TortoisePlink
…th interleaved byte streams)
…P reference server
…, since they're general
|
Argh, just spent many hours tracking down the stalling issue on Travis, turned out it was nothing to do with ssh - fixed it & was going to submit a PR but @michael-k already got to it: #396 Well played, wish I'd worked on this a day later ;) So now it's all working again & re-integrated with latest changes, will split it up. |
* Full SSH urls e.g. ssh://user@host/repo * Custom ports in both bare and full urls * GIT_SSH environment for alternate ssh clients * Explicit support for plink.exe and TortoisePlink.exe
|
I've split out the SSH URL/port/GIT_SSH support as #404 & re-merged this one on top. Also split out a couple of other small ones in case you want to do those separately but #404 is the main one. If you don't want to merge this until after the API changes then I won't do any more work on this PR for the moment, since merging it does take a bit of work each time (regular conflicts and I have to be careful to preserve modifications to moved code). I think there's a fairly significant discussion to be had around exactly DownloadObjects will work, I think you still need the Batch() up-front to estimate transfer times. FYI I'm splitting my time between Git LFS and something else now so I'll be back on this early next week barring small things. |
|
I just wanted to throw a +1 here to show my support. I think this would be a great feature to have. |
|
👍 |
|
Hey @AndrewJDR, just came across this PR again in my inbox. We just shipped #1236 which should be a big step in making it possible to easily implement this with minimal effort. 😄 |
|
I noticed this was closed. Does that mean there are no plans for a pure SSH implementation that ships with the official git-lfs package? |
|
I think it is something that we're interested in long-term, but not right now. |
|
Well for what it's worth, I'll be very happy when this gets implemented some day. If I had more free time, I'd do it myself. |
|
Yeah, it's way out of date now. I'd like a pure SSH route one day too just on principle, I think if you've chosen to use SSH you should be able to complete everything that way if you want. But in practice most providers end up with the lfs storage accessible primarily over HTTPS anyway, and you get some nice automatic features that way, so in practice it turned out to be a bit niche. I'm an old traditionalist though (shakes fist at cloud) so I'd like it to happen eventually. |
|
@sinbad If pure SSH does happen one day, please revive or at least throw a post into this thread so I get notified :) Thanks! |
|
Yeah, I'm glad I saw this before I implemented LFS for our project. The promise of "git lfs" is that "everything will just work." If you have no interest in supporting ssh properly, then we need the special GitHub type lfs http server with the special GitHub type protocol which means that you can't just run a server on any old Linux box with ssh, which means that LFS isn't really a standard at all. Will check back in a couple years once you guys figure this one out. |
|
@johnwbyrd This is a fair criticism of git-lfs imo. If you're looking for something that 'just works' over ssh, I can suggest looking at git-fat: https://github.com/ciena-blueplanet/git-fat From github's perspective, I think they made git-lfs to serve their own purposes as a company and opening it up was just gravy. I suppose I understand that there is maybe little incentive for them to implement this. That said... I really, really wish someone would implement this. |
|
Thanks for the feedback. Lack of an SSH protocol is definitely fair criticism about LFS. There are a lot of other core issues with LFS, and our small team is working on what we think is the most crucial. However, I'd love to work with someone on implementing a pure SSH protocol. |
|
The more I think about this problem, the more I think that the fundamental approach taken with LFS, annex, fat, lfs, etc.... are unsound. They all take the basic approach of creating a parallel metadata repository, and filtering out certain files so they are not stored in git, but stored in the metadata repository. This then creates a parallel problem of managing the metadata repository. So now you've taken one problem and made two problems out of it. Linus got it right the first time. Git's secret weapon is its distributed nature -- any git install is a client and a server and both. But everyone is trying to solve this problem in a client/server manner. I believe there is a better way... but it's not going to be got by messing around with the porcelain of Git, or by building servers to manage big data in parallel. Recall that if two git users have different local branches, their repositories can look quite different. One user can have a branch called "master" and another branch called "bigfiles". Another user can just have a branch called "master". The size of their repositories will be quite different, even though they are both working on the same master branch. I posit that the central problem is not managing "big" files and rerouting them into a central server with a filter. The central problem is deciding under which conditions certain refs and/or branches should be pushed or pulled between repositories, and then deciding how those refs can be combined and/or overlaid and presented to the user at checkout time. It is possible that this might be accomplished merely with another level of indirection. Like git lfs, this proposed method would have blobs containing SHA1 hashes which would be "fixed up" with a smudge filter... but unlike lfs, those smudge filters would look for blobs or trees in the current repository that match the SHA1 hash given. If they exist, you have a "full" repository; if not, then you don't currently have access to the "big" files and still need to get them from someone else. Look at what such a system would buy you. You could decide whether you wanted to be up-to-date constantly with a detailed history of every "big" asset, you could say you only wanted to track the latest "big" asset in the repository (conveniently, git gc gets rid of dangling blobs for you), or you could say you don't want to see big assets at all. And based on the branches and/or refs you receive at push/pull time, git itself could work out whether the "big" files need to go over the wire or not, and use its existing systems to transfer them if they need to go. No special servers or protocols needed. The question is not whether big files should be stored in git or not. They should. The question is who should have merely the SHA1s of those big files in their repository, and who should have the big files themselves in their repository. This is probably worthy of a blog post or longer discussion, And I'd be very surprised if I were the first person who thought of this approach. I'll get out of this pr now (I know this discussion does not belong here) and come back when I have something useful. Thanks for listening. |
|
Interesting point! I think being able to set up a file like .gitignore where you put in lines like ".png" that will be cloned/pulled to your repository only when you checkout a commit sounds great and would feel more git-like. You could have all these files garbage collected when you didn't check them out for a week or so. It would also be great if you could specifiy certain file sizes and combine then with file types for automatic purging from the repo. As these actions are potentially dangerous (you are telling your vcs to drop content silently you put in it in the first place) it should show an info like "incomplete repository" when calling up "git status". When you are interested in a native git-like solution, you can achieve kind-of-the-same by putting large files into submodules and shallow-cloning these submodules. This way, you keep the entire source code history but have only a subset of the big files shallow-cloned to you local submodul. |
|
@johnwbyrd did you ever follow up with an implementation or more discussion somewhere? |
|
Nope, Real Life has gotten in the way. In my copious spare time. Real Soon Now. In the meantime, note that "git overlay" as a concept means several different things, so I'll need to go with "git layer" or some similar term. Also note that the new "git worktree" concept is about two doors down from the thing I'm proposing. |
|
@johnwbyrd It's okay I understand that time is precious.
|
|
git worktree is a new git feature that demonstrates, at least conceptually, that two branches of the same git repository can coexist at the same moment. git worktree does nothing about reconciling these differences into a single directory, nor was it intended to do that. |

Implements a pure SSH path allowing you to perform LFS operations entirely over SSH instead of only using SSH for auth and uploading/downloading over HTTP. Useful for those who want to simply self-host using existing SSH key access without using a web server. SSH key authentication is required.
Corresponding reference server implementation is at https://github.com/sinbad/git-lfs-ssh-serve (there's also a mock SSH client in test/cmd/lfstest-mockssh used for integration testing)
Other features of this PR:
Future enhancements:
Previously submitted as #350