Adding parsing methods to extract LFS references from arbitrary history#559
Adding parsing methods to extract LFS references from arbitrary history#559technoweenie merged 5 commits intogit-lfs:masterfrom
Conversation
There was a problem hiding this comment.
Not a big deal, but why is this being joined with a global slice when none of the flags change? Couldn't this just use the global slice?
There was a problem hiding this comment.
Does it need the global logLfsSearchArgs? I don't see it referenced anywhere else.
There was a problem hiding this comment.
In both cases the answer to this is it's there for my future use :) The idea is that the local logArgs represents the history filters for this specific function (unpushed), while logLfsSearchArgs are the common arguments that will be needed for all history scans. I'm going to need the distinction and the shared global args for the other work I'm doing on fetch and they're probably useful elsewhere too. So in other functions I'll use a different set of log filtering args but still the standard logLfsSearchArgs to get the format & oid change filter.
There was a problem hiding this comment.
Also note how parseLogOutputToPointers requires that at least logLfsSearchArgs are specified in the git log command that produces the output it consumes, but other params are up to the caller.
|
Looks good. Will this work with pointers that use extensions? The oid will definitely change as the extensions change, so I think it will. |
|
This looks cool 👍 |
Yes, so long as the oid changes the diff will come out in the log search. However if there's a chance that more than 2 extension lines will occur before the oid line then the |
|
Hmm, I don't recall if we set an upper limit on the number of extensions. I can see setting it to something really low, like 3 or 5. 10 is fine for now until we can settle on a real limit. |
|
Yeah |
|
Oh, I see extensions have been merged in now. I've updated this PR so |
|
If you want to get the "actual" file from an lfs pointer with extensions, you'll have to parse all that out. This info is only stored in these pointer files. A plain object in Keep in mind: extensions have not shipped. Now's the time to make changes :) |
|
Hrm, that's going to make pruning (& maybe other functions) a lot more complicated if you can no longer rely on the oid == the storage location. Doesn't this also mean that if an extension changed it could generate a different storage location for the same oid? So changing your gitconfig can completely change the storage location of all assets even if they had the same original oid? It would be a lot simpler for many purposes if there was a 'storage oid' field in the pointer which easily identified where the data is stored, rather than having to locate the highest Also 'Size' is presumably not reliable any more because it represents the original size of the object, not what it is after the extension chain was applied? |
|
And eek, even a small screw up in your extension code can utterly and completely screw up your storage integrity, and every developer has to get it right. I'd hate to support this ;) |
|
Above comments might be wrong, extensions proposal said:
If it's the case that |
|
The OID is still the storage location, and the size matches the byte size of the object in local storage ( Now, if you want to turn the blob into the actual file during checkout, you'll either need all the pointer fields from the log, or you'll need to read the pointer from the git blob.
The Git config only covers the extensions for new files. Existing files will be smudged according to the info in the pointer.
Yeaaaah. At least we only have to worry about the extension system working. But I think that we should advertise extensions as an "experimental" feature. I'm afraid people will rush to build compression filters based on a non deterministic algorithm. |
Phew. Thanks for the confirmation.
OK, my latest commit on this PR includes all the
Agreed, all it takes is for the extension to change over time & 2 developers have different versions, or for people to choose a generic name like 'compress' and for that to have been used by different extensions :o Wild thought: maybe the extensions themselves should have an identifying SHA based on how they translate a known input string? Then if they're incompatible you know before you start trampling over real data. |
|
Git LFS Extensions support 10 extensions numbered 0-9. |
|
OK |
|
This looks good, but I'd like to see some more tests. For instance, none of the tests have extensions in the diffs. One other thing: the It doesn't do anything yet, but I wonder how it'll affect this function if we add a custom "lfs" diff handler. |
|
Good call on the diff handler, that should be explicit. I'll deal with that and add a few cases with extension lines. |
|
Tests added for extension cases. I've tested the use of a custom diff tool in .gitattributes (basically just defining one that called |
|
Great, thanks for checking into that. |
Adding parsing methods to extract LFS references from arbitrary history
This is part of my preparatory work to support the
prunecommand; I thought I would submit this part early both to establish its use while I continue working on the specific features, and in case it is useful elsewhere.The focus of this PR is the
parseLogOutputToPointersfunction which can scan a properly formattedgit logoutput & extract relevant changes to LFS references. I've included an example of its use, ScanUnpushed() which shows what oids haven't been pushed anywhere & therefore only exist locally. However it will also be useful for other tasks (e.g.fetch --recent). The reason for it being able to scan for additions or removals is that depending on which direction in history you're scanning you might want either one (e.g. scanning backwards from a snapshot date between commits you want the '-' diffs to know what was being used beforehand).The main feature of this method compared to alternatives is that it can extract WrappedPointers from arbitrary history with just a single
git logcommand (and can be run in parallel with other tasks if needed)