Conversation
1d1f8c6 to
7cce4dd
Compare
PastelMobileSuit
left a comment
There was a problem hiding this comment.
✨ This looks great! I just had a couple of comments about integration tests
t/t-attributes.sh
Outdated
| assert_local_object "$(calc_oid "$contents")" 9 | ||
|
|
||
| contents2="other data" | ||
| printf "$contents" > dir/foo.bin |
There was a problem hiding this comment.
Should this be $contents2 instead of $contents?
There was a problem hiding this comment.
Ah, yeah, that should be $contents2.
t/t-attributes.sh
Outdated
| mkdir dir | ||
| printf '[attr]lfs filter=lfs diff=lfs merge=lfs -text\n*.dat lfs\n' \ | ||
| > .gitattributes | ||
| printf '[attr]lfs2 filter=lfs diff=lfs merge=lfs -text\n*.dat lfs2\n' \ |
There was a problem hiding this comment.
Should this say *.bin here instead of *.dat?
| printf "$contents" > foo.dat | ||
| git add *.dat | ||
| git commit -m 'foo.dat' | ||
| assert_local_object "$(calc_oid "$contents")" 9 |
There was a problem hiding this comment.
Since the object exists locally in git I think this will be true regardless of whether or not the file is tracked. I think we should have a git lfs track call that ensures the pattern is already supported as in the above test
There was a problem hiding this comment.
You're right. Git will already have run it through the filter, so the only case in which we wouldn't have the object would be if Git hadn't honored the setting, and we're not testing Git.
I'll leave this as it is for the purposes of ensuring general sanity, but add a call to git lfs track.
There was a problem hiding this comment.
Ah, one thing to mention is that on this line (unlike below), we're working solely with files outside of the working tree, which git lfs track intentionally ignores (except to read macros). Below, we don't have that problem, since the file in the working tree isn't ignored.
I'll need to come up with a different way to test this part.
| printf "$contents" > foo.dat | ||
| git add *.dat | ||
| git commit -m 'foo.dat' | ||
| assert_local_object "$(calc_oid "$contents")" 9 |
There was a problem hiding this comment.
Same here as above: I think we need a git lfs track call to ensure the pattern is already supported
|
Okay, all issues addressed. |
PastelMobileSuit
left a comment
There was a problem hiding this comment.
✨ Thanks for addressing my feedback!
096517b to
ba20d0c
Compare
Git specifically defines the "-filter" syntax as meaning "filter=false". Consequently, it's not valid to write "-filter=lfs", since that provides two possible values for the filter attribute. Since this is not valid syntax, adjust the line in the tests.
Update wildmatch to v1.0.1 so that we can handle escaped hash marks in gitattributes files.
We currently have two separate pieces of code for parsing attributes: the attribs.go file in the git package, and the gitattr package. Unify them by converting the former to call the latter. In doing so, move the line ending splitter from the former package to the latter package and have the ParseLines call in the latter package return an additional value which represents the proper line ending value for the file read.
In certain gitattributes files, it's possible to specify macros in addition to plain patterns. Add basic support for parsing these macros.
When processing .gitattributes files, it's possible to have macros defined. Add a macro processor class that takes lists of attribute lines and expands them using macros, optionally reading in any macros from the file. This design allows us to process macros for some files and not others, like Git does.
Git supports macros in top-level gitattributes files only, not in files in working tree subdirectories. When enumerating gitattributes files in the working tree, determine whether we should read macros from this file as well. Note that we always read macros from the system and global files, so don't touch these code paths for now; we'll update them in a future commit.
Read and expand macros when reading the gitattributes files so that users can use shorthands such as "lfs" for "filter=lfs diff=lfs merge=lfs". Pass the same macro processor into each attribute lookup function, since macros can be defined in the system or global file and then used in per-repository files.
Several different configuration files used by Git can be located with a directory specified by the environment variable $XDG_CONFIG_HOME. Using this path allows the user to specify a desired place for their configuration files. Add a function to look up paths that use this directory and add tests for it. Note that we use the slash syntax even on Windows when invoking ExpandPath as that function doesn't accept backslashes.
Git documents the default global attributes file if core.attributesFile is not specified as $XDG_CONFIG_HOME/git/attributes, or $HOME/.config/git/attributes if $XDG_CONFIG_HOME is not set or empty. Teach our attributes file code about this file and read it as normal.
Currently, when checking for files being tracked, we don't read gitattributes files that are not in the working directory, so we don't expand any macros that might be in them. Ensure that when we read any gitattributes file, we read all of them, and in the right order, to expand macros in later files that have been defined in earlier ones.
In commit 1ff5254 of PR git-lfs#3391 we introduced the MacroProcessor type and methods to support the use of macro attributes in .gitattributes files. However, we do not currently support the case where a macro attributes is specified with a "!" prefix, which Git handles by setting all attributes defined by the macro attribute back to the unspecified state. (Note that the "-" prefix is not supported by Git for macro attributes, only the "!" one.) To mimic the same behaviour in Git LFS we add a check for a macro attribute with its Unspecified bool set to "true", and when this is detected we iterate through the set of attributes defined by the macro attribute and set them all to the same unspecified state. We also add tests to confirm this new handling works as expected.
In commit f4b8938 of PR git-lfs#3391 the t/t-attributes.sh file was added with an initial "macros" test, and part of that test confirms that macro attribute definitions are only processed when they appear in the top-level .giattributes file in a repository. The test does confirm this in that it creates both an "lfs2" macro attribute definition and assignment of that attribute name to a file pattern in a .gitattributes file in a subdirectory, and then validates that a file matching that pattern in the subdirectory is not converted into an LFS object. The test also includes a second check of this logic in which it confirms that a "git lfs track" command for the file pattern in the subdirectory succeeds, i.e., that it does not fail because the file pattern was already assigned the normal "filter=lfs" attribute by the "lfs2" macro attribute. However, this particular check will always succeed, even if macro attribute definitions like the "lfs2" one are incorrectly accepted from .gitattributes files other than the top-level one. This is because the "git lfs track" command is run in the top-level directory and only sets a pattern that includes the subdirectory in its path (i.e., "dir/*.bin"). This will succeed regardless of whether the "*.bin" pattern is assigned to LFS attributes in the dir/.gitattributes file. We therefore make this second check more sensitive to potential future regressions by running the "git lfs track" command in the subdirectory. Now if the macro definition in the .gitattributes file in that directory is (incorrectly) read as a valid definition, and the check which tests that the file in the subdirectory has not been converted into an LFS object is skipped, then this second check fails as expected.
In commit 1ff5254 of PR git-lfs#3391 we introduced the MacroProcessor type and methods to support the use of macro attributes in .gitattributes files. However, we do not currently support the case where a macro attributes is specified with a "!" prefix, which Git handles by setting all attributes defined by the macro attribute back to the unspecified state. (Note that the "-" prefix is not supported by Git for macro attributes, only the "!" one.) To mimic the same behaviour in Git LFS we add a check for a macro attribute with its Unspecified bool set to "true", and when this is detected we iterate through the set of attributes defined by the macro attribute and set them all to the same unspecified state. We also add tests to confirm this new handling works as expected, both a new Go test and two new tests in t/t-fsck.sh. In the latter file we refactor the setup_invalid_pointers() helper function so that we can reuse some of its code in a new, smaller function that just creates invalid pointers. The new "fsck does not detect invalid pointers with negated macro patterns" test in t/t-fsck.sh will not succeed without the changes to the MacroProcessor in this commit, because without those changes any patterns that reference a macro attribute with the "!" prefix are not processed as unspecifying the macro's attributes, and so non-pointer files matching those patterns are reported as invalid LFS pointers. The "fsck detects invalid pointers with macro patterns" test, on the other hand, simply validates existing behaviour. In both of the new tests in t/t-fsck.sh we include comments describing how the "git lfs fsck" command currently processes .gitattributes files in the order returned by "git ls-tree", and so a .gitattributes file in a directory such as .dir/ will be parsed before the top-level .gitattributes one because it sorts first. The result is that any macro attribute references in the .dir/.gitattributes file will not be resolved properly, and so our tests either skip testing this situation or they succeed but not quite for the right reasons. For now we defer resolution of this issue to the future.
In commit f4b8938 of PR git-lfs#3391 the t/t-attributes.sh file was added with an initial "macros" test, and part of that test confirms that macro attribute definitions are only processed when they appear in the top-level .giattributes file in a repository. The test does confirm this in that it creates both an "lfs2" macro attribute definition and assignment of that attribute name to a file pattern in a .gitattributes file in a subdirectory, and then validates that a file matching that pattern in the subdirectory is not converted into an LFS object. The test also includes a second check of this logic in which it confirms that a "git lfs track" command for the file pattern in the subdirectory succeeds, i.e., that it does not fail because the file pattern was already assigned the normal "filter=lfs" attribute by the "lfs2" macro attribute. However, this particular check will always succeed, even if macro attribute definitions like the "lfs2" one are incorrectly accepted from .gitattributes files other than the top-level one. This is because the "git lfs track" command is run in the top-level directory and only sets a pattern that includes the subdirectory in its path (i.e., "dir/*.bin"). This will succeed regardless of whether the "*.bin" pattern is assigned to LFS attributes in the dir/.gitattributes file. We therefore make this second check more sensitive to potential future regressions by running the "git lfs track" command in the subdirectory. Now if the macro definition in the .gitattributes file in that directory is (incorrectly) read as a valid definition, and the check which tests that the file in the subdirectory has not been converted into an LFS object is skipped, then this second check fails as expected.
In commit 1ff5254 of PR git-lfs#3391 we introduced the MacroProcessor type and methods to support the use of macro attributes in .gitattributes files. However, we do not currently support the case where a macro attributes is specified with a "!" prefix, which Git handles by setting all attributes defined by the macro attribute back to the unspecified state. (Note that the "-" prefix is not supported by Git for macro attributes, only the "!" one.) To mimic the same behaviour in Git LFS we add a check for a macro attribute with its Unspecified bool set to "true", and when this is detected we iterate through the set of attributes defined by the macro attribute and set them all to the same unspecified state. We also add tests to confirm this new handling works as expected, both a new Go test and two new tests in t/t-fsck.sh. In the latter file we refactor the setup_invalid_pointers() helper function so that we can reuse some of its code in a new, smaller function that just creates invalid pointers. The new "fsck does not detect invalid pointers with negated macro patterns" test in t/t-fsck.sh will not succeed without the changes to the MacroProcessor in this commit, because without those changes any patterns that reference a macro attribute with the "!" prefix are not processed as unspecifying the macro's attributes, and so non-pointer files matching those patterns are reported as invalid LFS pointers. The "fsck detects invalid pointers with macro patterns" test, on the other hand, simply validates existing behaviour. In both of the new tests in t/t-fsck.sh we include comments describing how the "git lfs fsck" command currently processes .gitattributes files in the order returned by "git ls-tree", and so a .gitattributes file in a directory such as .dir/ will be parsed before the top-level .gitattributes one because it sorts first. The result is that any macro attribute references in the .dir/.gitattributes file will not be resolved properly, and so our tests either skip testing this situation or they succeed but not quite for the right reasons. For now we defer resolution of this issue to the future.
In commit f4b8938 of PR git-lfs#3391 the t/t-attributes.sh test script was introduced with its initial "macros" test, which validates that the "git lfs track" command is able to parse macro attribute definitions in the top-level .gitattributes file and resolve references to those macros in the same file. It also confirms that the command does not accept macro definitions in .gitattributes files in subdirectories, as Git does not accept these either. However, Git does resolve macro attribute references from .gitattributes files in subdirectories, so long as they refer to macro attributes defined in the top-level .gitattributes (or one of the other files where definitions are accepted, such as the .git/info/attributes file). But the "git lfs track" command at present does not resolve such references consistently because it sorts the attributes files by path length and then processes them strictly in that order, from longest to shortest. Thus references to macro attributes defined in the top-level .gitattributes file from other attributes files never succeed because the top-level file is always parsed last (except for the global and system attributes files). We therefore add a note to this effect in the "macros" test to explain why we do not test valid macro attribute references in a .gitattributes file in a subdirectory. (There is also an inconsistency in how "git lfs track" handles references to macro attributes defined in the .git/info/attributes file, because if the references appear in .gitattributes files whose full file path in the repository is longer than ".git/info/attributes", then the references are not resolved as these files are parsed before the .git/info/attributes one, whereas references from other .gitattributes files are resolved.) Separately, in commit 608bc8d of PR git-lfs#4525 support for scanning the repository contents using the output of the "git ls-tree" command was added to help enable the "git lfs fsck" to search for invalid Git LFS pointer files. The GitScanner.ScanRefByTree() method invokes a chain of functions, of which catFileBatchTreeForPointers() reads Git blob metadata and examines each blob in turn to see if it is a Git LFS pointer or a .gitattributes file, and if it is the latter it reads and parses its contents, including macro attribute definitions if the file is the top-level .gitattributes file. We therefore add a "fsck detects invalid pointers with macro patterns" test to the t/t-fsck.sh test script which validates the ability of the "git lfs fsck" command to report as invalid pointers any files matching patterns with a "filter=lfs" attribute defined by reference to a macro attribute defined in the top-level .gitattributes file. To do this we refactor the setup_invalid_pointers() helper function so that we can reuse some of its code in a new, smaller function that just creates invalid pointers. However, we also add a note explaining that we can not yet test this behaviour with a .gitattributes file whose parent directory sorts before the top-level .gitattributes one in the output from "git ls-tree". Because that command outputs its results sorted by filepath, a file such as .dir/.gitattributes will be listed before the top-level .gitattributes file, and so any macro attribute references from the .dir/.gitattributes file to macro attributes defined in the top-level .gitattributes file will not be resolved in the way that Git resolves them. For now we defer resolution of this issue and the ones described regarding the "git lfs track" command to the future.
In commit 1ff5254 of PR git-lfs#3391 we introduced the MacroProcessor type and methods to support the use of macro attributes in .gitattributes files. However, we do not currently support the case where a macro attributes is specified with a "!" prefix, which Git handles by setting all attributes defined by the macro attribute back to the unspecified state. (Note that the "-" prefix is not supported by Git for macro attributes, only the "!" one.) To mimic the same behaviour in Git LFS we add a check for a macro attribute with its Unspecified bool set to "true", and when this is detected we iterate through the set of attributes defined by the macro attribute and set them all to the same unspecified state. We also add tests to confirm this new handling works as expected, both a new Go test and a new "fsck does not detect invalid pointers with negated macro patterns" test in t/t-fsck.sh that will not succeed without the changes to the MacroProcessor in this commit. Without these changes, any patterns that reference a macro attribute with the "!" prefix are not processed as making the macro's attributes all unspecified again, and so non-pointer files matching those patterns are reported as invalid Git LFS pointers. In the new test in t/t-fsck.sh we include comments describing how the "git lfs fsck" command currently processes .gitattributes files in the order returned by "git ls-tree", and so a .gitattributes file in a subdirectory such as .dir/ will be parsed before the top-level .gitattributes one because it appears first in the "git ls-tree" output. The result is that any macro attribute references in the .dir/.gitattributes file will not be resolved properly, and so our test succeeds but not quite for the right reasons. We also add a new "macros with unspecified flag" test in the t/t-attributes.sh test script, but this test ultimately is only a placeholder as it can not actually test that the "git lfs track" command will not overwrite a pattern in a .gitattributes file in a subdirectory if it references a macro attribute defined in the top-level .gitattributes file and the reference has the "!" prefix. This is due to the fact that the "git lfs track" command parses .gitattributes files in the order of the length of their full paths, from longest to shortest, and so macro attribute references can not be resolved except within the top-level .gitattributes file (with some caveats regarding the .git/info/attributes file and the global and system attributes files). For now we defer resolution of both this issue and the one described regarding the "git lfs fsck" command to the future.
Support for Git macro attributes was added in a series of commits in PR git-lfs#3391, including commit 9d3e52d, where the Line structure of the "git/gitattr" package was updated to include a Macro element which would be set non-nil for macro definition lines in a .gitattributes file file, while the existing Pattern element would be set non-nil for all other lines. The "git lfs track" command, among others, was then adjusted to create a MacroProcessor structure (from the same "git/gitattr" package) and call its ProcessLines() method to resolve any macro references and thus convert the "raw" parsed Line structures into a set for which the Pattern element was always non-nil, and no Macro elements appeared. Later, the "git lfs fsck" command gained the ability to process macro definitions in .gitattributes files, in PR git-lfs#4525. However, the "git lfs migrate import" command was not adjusted, specifically in the implementation of its "--fixup" option, which initializes a Tree structure (also of the "git/gitattr" package) for the root tree of each commit in a repository's history using the package's New() function. This function traverses all the trees in the hierarchy and finds and parses all the .gitattributes files in them. Then, when the command visits each file within the commit's tree using the Rewrite() method of the Rewriter structure in the "git/githistory" package, it calls the (*Tree).Applied() method to match the file's path against any applicable Git attributes, to see if the file should be treated as a Git LFS object. This lack of support for macro attributes in the "git lfs migrate import --fixup" command was then propagated to the "git lfs migrate info --fixup" command in commit 4800c5e of PR git-lfs#4501, when the "git lfs migrate info" command was updated to respect the --fixup option. As a result, both of these commands (when used with the --fixup option) would panic if they encountered a .gitattributes file with any macro definition, as they would call the (*Tree).Applied() method and it would attempt to access the nil Pattern element of the lines with non-nil Macro elements. (Prior to the changes in commit c374d1f of PR git-lfs#5375 the "git lfs migrate import --fixup" command would then stall indefinitely, but it now also exits after the panic condition.) These problems were reported in issue git-lfs#5332. To resolve this problem and avoid similar ones in the future, we refactor the Line structure into a Line interface, which only provides a Attrs() method to retrieve a slice of Attr attributes, and no other methods. We then also define two additional interfaces, each of which embeds the Line interface, PatternLine and MacroLine, with corresponding getter methods for their respective elements. The ParseLine() function of the "git/gitattr" package now returns a slice of generic Line types, each of which is either a PatternLine or a MacroLine, but never both. Callers like the Applied() method of the Tree structure therefore need to perform type assertions or switches to determine which type of Line they are handling, which ensures they always access the line's data through safe methods. We then update the Go tests for the "git/gitattr" package as appropriate, and also add two tests each to the t/t-migrate-fixup.sh and t/t-migrate-import.sh test suites. All four of these new shell tests fail without the changes in this commit. In particular, several of these tests make sure to run the "git lfs migrate" commands outside of any shell pipeline so the test will fail if the command panics and produces no output, even if no output is the expected condition for a successful execution of the command.
Support for Git macro attributes was added in a series of commits in PR git-lfs#3391, including commit 9d3e52d, where the Line structure of the "git/gitattr" package was updated to include a Macro element which would be set non-nil for macro definition lines in a .gitattributes file file, while the existing Pattern element would be set non-nil for all other lines. The "git lfs track" command, among others, was then adjusted to create a MacroProcessor structure (from the same "git/gitattr" package) and call its ProcessLines() method to resolve any macro references and thus convert the "raw" parsed Line structures into a set for which the Pattern element was always non-nil, and no Macro elements appeared. Later, the "git lfs fsck" command gained the ability to process macro definitions in .gitattributes files, in PR git-lfs#4525. However, the "git lfs migrate import" command was not adjusted, specifically in the implementation of its "--fixup" option, which initializes a Tree structure (also of the "git/gitattr" package) for the root tree of each commit in a repository's history using the package's New() function. This function traverses all the trees in the hierarchy and finds and parses all the .gitattributes files in them. Then, when the command visits each file within the commit's tree using the Rewrite() method of the Rewriter structure in the "git/githistory" package, it calls the (*Tree).Applied() method to match the file's path against any applicable Git attributes, to see if the file should be treated as a Git LFS object. This lack of support for macro attributes in the "git lfs migrate import --fixup" command was then propagated to the "git lfs migrate info --fixup" command in commit 4800c5e of PR git-lfs#4501, when the "git lfs migrate info" command was updated to respect the --fixup option. As a result, both of these commands (when used with the --fixup option) panic if they encounter a .gitattributes file with any macro definitions, as they call the (*Tree).Applied() method and it attempts to access the nil Pattern element of the lines with non-nil Macro elements. (Prior to the changes in commit c374d1f of PR git-lfs#5375 the "git lfs migrate import --fixup" command would then stall indefinitely, but it now also exits after the panic condition.) These problems were reported in issue git-lfs#5332. To resolve this problem and avoid similar ones in the future, we refactor the Line structure into a Line interface, which only provides an Attrs() method to retrieve a slice of Attr attributes, and no other methods. We then also define two additional interfaces, each of which embeds the Line interface, PatternLine and MacroLine, with corresponding getter methods for their respective elements. The ParseLine() function of the "git/gitattr" package now returns a slice of generic Line types, each of which is either a PatternLine or a MacroLine, but never both. Callers like the Applied() method of the Tree structure therefore need to perform type assertions or switches to determine which type of Line they are handling, which ensures they always access the line's data through safe methods. We then update the Go tests for the "git/gitattr" package as appropriate, and also add two tests each to the t/t-migrate-fixup.sh and t/t-migrate-import.sh test suites. All four of these new shell tests fail without the changes in this commit. In particular, several of these tests make sure to run the "git lfs migrate" commands outside of any shell pipeline so the test will fail if the command panics and produces no output, even if no output is the expected condition for a successful execution of the command.
It's possible in Git to use an attribute macro to specify a set of attributes and then apply them over a set of patterns like so:
This series introduces some refactors so that we use one set of attribute parsing code and then implements macros on top of that newly refactored base.
As a note, we need to update
git-lfs/wildmatch, since the previous version doesn't properly escape hash marks, leading to test failures (and in edge cases, breakage) when we standardize on using wildmatch for all patterns.This series fixes #1719.