Conversation
|
This looks good to me. |
|
I just pushed a change that stores the skipped refs on the |
Looks sweet. I'm a little bit worried about that list being potentially large, but not sure what other alternatives I'd go for. |
|
Okay, I think this is ready to go now. |
|
As a bonus, I exported the base channel wrapper stuff to |
| s.mu.Lock() | ||
| defer s.mu.Unlock() | ||
|
|
||
| if s.closed { |
There was a problem hiding this comment.
I think you can remove the mutex lock and unlock here by using a uint32 and either storing 0 or 1 by using the sync/atomic package.
| return scanIndex(ref) | ||
| } | ||
|
|
||
| func (s *GitScanner) opts(mode ScanningMode) *ScanRefsOptions { |
There was a problem hiding this comment.
Leaving a note here that this is a temporary measure at least until we can get rid of *ScanRefsOptions.
The GitScanner structure and its methods were introduced in PR git-lfs#1670, and in commit bdbca39 of that PR the structure's internal sync.Mutex element was added to protect access to the "remote" and "skippedRefs" elements, as well as the "closed" element. The structure's methods always acquire the mutex before reading or writing these elements, particularly the "remote" one. This type of design is typical of other structures where we expect concurrent access to the structure's data, such as when multiple goroutines attempt to read and write the same data. However, in practice the GitScanner structure's "remote" and "skippedRefs" elements are only ever initialized once, and indeed only initialized in one specific use case by the (*uploadContext).buildGitScanner() method for the "git lfs push" and "git lfs pre-push" commands. Meanwhile, the "closed" element is only used in the Close() method, and this method is only ever called once per structure by our existing code. Moreover, other elements of the GitScanner structure which have been added since its original introduction are set and retrieved without locking the mutex, such as the FoundLockable and PotentialLockables elements, which are checked and read directly in the scanRefsToChan() function of the "lfs" package. As it never the case that multiple goroutines need to read and write to the "remote", "skippedRefs", and "closed" internal elements of the GitScanner structure, we can remove the mutex and the associated code, which simplifies the setup of the structure.
The GitScanner structure and its methods were introduced in PR git-lfs#1670, and in commit bdbca39 of that PR the structure's Close() method was introduced. Unlike other similar structures whose Close() methods should be called to release underlying resources such as channels or I/O streams, the (*GitScanner).Close() method serves only to output an optional performance timing trace metric. This Close() method is not called consistently; for instance, it is never called by the migrateExportCommand() function of the "git lfs migrate" command, and will be skipped by the checkoutCommand() function of the "git lfs checkout" command if an error is returned by the (*GitScanner).ScanTree() method. The utility of the performance timing metric is also undercut by the fact that some commands perform other tasks before and after calling the specific (*GitScanner).Scan*() method they invoke. And in the particular case of the "git lfs prune" command, multiple goroutines are started, each of which runs a different Scan*() method simultaneously with the others, so the final timing metric does not account for their different execution times, just the overall final timing. We can improve the value of the timing metric while also simplifying the calling convention for the GitScanner structure's methods by removing the Close() method, and tracing the performance of each Scan*() method individually. Removing the Close() method clarifies that no underlying resources must be released for the GitScanner structure, and so callers need not try to register a deferred call to the method. This parallels some other conventional Go structures, such as the Scanner structure of the "bufio" package. As well, running a "git lfs prune" command with the GIT_TRACE_PERFORMANCE=1 environment variable set now results in more detailed and useful output, for example: 12:36:51.221526 performance ScanStashed: 0.013632533 s 12:36:51.224494 performance ScanUnpushed: 0.016570280 s 12:36:51.240670 performance ScanTree: 0.017171717 s
The RevListScanner structure of the "git" package is constructed by the NewRevListScanner() function, which takes a ScanRefsOptions structure as an argument. The git.ScanRefsOptions structure encapsulates the large number of flags and other fields which a caller may potentially set for the "git rev-list" scanning operation. The "lfs" package also has a similar ScanRefsOptions structure, which is used by the various Scan*() methods of the GitScanner structure to pass optional flags and fields to the internal functions which implement the scanning operations. However, unlike the "git" package, this ScanRefsOptions structure is not used when constructing a GitScanner, and is moreover never used outside of the "lfs" package. We can therefore remove this structure in favour of a more conventional set of internal elements encapsulated directly in the GitScanner structure. This simplifies the signatures of many of the internal scan*() functions and helps clarify what parts of the GitScanner structure actually need to be exported. As part of this change, we also create a new, dedicated and un-exported nameMap structure, which holds the map of Git object SHAs to their pathspecs, as populated during scan operations. It is then utilized in the existing lockableNameSet structure, whose Check() method is called by the runCatFileBatch() and runCatFileBatchCheck() functions to collect the pathspecs of files locked by other users. Note that the intent to remove the ScanRefsOptions structure from the "lfs" package was discussed in several PRs in 2016, and in particular, this commit's changes achieve some of the goals of the never-merged PR git-lfs#1595, but without introducing an entirely generic "scanner" package. See also: git-lfs#1670 (comment)
The GitScanner structure and its methods were introduced in PR git-lfs#1670, and in commit bdbca39 of that PR the structure's Close() method was introduced. Unlike other similar structures whose Close() methods should be called to release underlying resources such as channels or I/O streams, the (*GitScanner).Close() method serves only to output an optional performance timing trace metric. This Close() method is not called consistently; for instance, it is never called by the migrateExportCommand() function of the "git lfs migrate" command, and will be skipped by the checkoutCommand() function of the "git lfs checkout" command if an error is returned by the (*GitScanner).ScanTree() method. The utility of the performance timing metric is also undercut by the fact that some commands perform other tasks before and after calling the specific (*GitScanner).Scan*() method they invoke. And in the particular case of the "git lfs prune" command, multiple goroutines are started, each of which runs a different Scan*() method simultaneously with the others, so the final timing metric does not account for their different execution times, just the overall final timing. We can improve the value of the timing metric while also simplifying the calling convention for the GitScanner structure's methods by removing the Close() method, and tracing the performance of each Scan*() method individually. Removing the Close() method clarifies that no underlying resources must be released for the GitScanner structure, and so callers need not try to register a deferred call to the method. This parallels some other conventional Go structures, such as the Scanner structure of the "bufio" package. As well, running a "git lfs prune" command with the GIT_TRACE_PERFORMANCE=1 environment variable set now results in more detailed and useful output, for example: 12:36:51.221526 performance ScanStashed: 0.013632533 s 12:36:51.224494 performance ScanUnpushed: 0.016570280 s 12:36:51.240670 performance ScanTree: 0.017171717 s
The RevListScanner structure of the "git" package is constructed by the NewRevListScanner() function, which takes a ScanRefsOptions structure as an argument. The git.ScanRefsOptions structure encapsulates the large number of flags and other fields which a caller may potentially set for the "git rev-list" scanning operation. The "lfs" package also has a similar ScanRefsOptions structure, which is used by the various Scan*() methods of the GitScanner structure to pass optional flags and fields to the internal functions which implement the scanning operations. However, unlike the "git" package, this ScanRefsOptions structure is not used when constructing a GitScanner, and is moreover never used outside of the "lfs" package. We can therefore remove this structure in favour of a more conventional set of internal elements encapsulated directly in the GitScanner structure. This simplifies the signatures of many of the internal scan*() functions and helps clarify what parts of the GitScanner structure actually need to be exported. As part of this change, we also create a new, dedicated and un-exported nameMap structure, which holds the map of Git object SHAs to their pathspecs, as populated during scan operations. It is then utilized in the existing lockableNameSet structure, whose Check() method is called by the runCatFileBatch() and runCatFileBatchCheck() functions to collect the pathspecs of files locked by other users. Note that the intent to remove the ScanRefsOptions structure from the "lfs" package was discussed in several PRs in 2016, and in particular, this commit's changes achieve some of the goals of the never-merged PR git-lfs#1595, but without introducing an entirely generic "scanner" package. See also: git-lfs#1670 (comment)
The GitScanner structure and its methods were introduced in PR git-lfs#1670, and in commit bdbca39 of that PR the structure's internal sync.Mutex element was added to protect access to the "remote" and "skippedRefs" elements, as well as the "closed" element. The structure's methods always acquire the mutex before reading or writing these elements, particularly the "remote" one. This type of design is typical of other structures where we expect concurrent access to the structure's data, such as when multiple goroutines attempt to read and write the same data. However, in practice the GitScanner structure's "remote" and "skippedRefs" elements are only ever initialized once, and indeed only initialized in one specific use case by the (*uploadContext).buildGitScanner() method for the "git lfs push" and "git lfs pre-push" commands. Meanwhile, the "closed" element is only used in the Close() method, and this method is only ever called once per structure by our existing code. Moreover, other elements of the GitScanner structure which have been added since its original introduction are set and retrieved without locking the mutex, such as the FoundLockable and PotentialLockables elements, which are checked and read directly in the scanRefsToChan() function of the "lfs" package. As it never the case that multiple goroutines need to read and write to the "remote", "skippedRefs", and "closed" internal elements of the GitScanner structure, we can remove the mutex and the associated code, which simplifies the setup of the structure.
The GitScanner structure and its methods were introduced in PR git-lfs#1670, and in commit bdbca39 of that PR the structure's Close() method was introduced. Unlike other similar structures whose Close() methods should be called to release underlying resources such as channels or I/O streams, the (*GitScanner).Close() method serves only to output an optional performance timing trace metric. This Close() method is not called consistently; for instance, it is never called by the migrateExportCommand() function of the "git lfs migrate" command, and will be skipped by the checkoutCommand() function of the "git lfs checkout" command if an error is returned by the (*GitScanner).ScanTree() method. The utility of the performance timing metric is also undercut by the fact that some commands perform other tasks before and after calling the specific (*GitScanner).Scan*() method they invoke. And in the particular case of the "git lfs prune" command, multiple goroutines are started, each of which runs a different Scan*() method simultaneously with the others, so the final timing metric does not account for their different execution times, just the overall final timing. We can improve the value of the timing metric while also simplifying the calling convention for the GitScanner structure's methods by removing the Close() method, and tracing the performance of each Scan*() method individually. Removing the Close() method clarifies that no underlying resources must be released for the GitScanner structure, and so callers need not try to register a deferred call to the method. This parallels some other conventional Go structures, such as the Scanner structure of the "bufio" package. As well, running a "git lfs prune" command with the GIT_TRACE_PERFORMANCE=1 environment variable set now results in more detailed and useful output, for example: 12:36:51.221526 performance ScanStashed: 0.013632533 s 12:36:51.224494 performance ScanUnpushed: 0.016570280 s 12:36:51.240670 performance ScanTree: 0.017171717 s
The RevListScanner structure of the "git" package is constructed by the NewRevListScanner() function, which takes a ScanRefsOptions structure as an argument. The git.ScanRefsOptions structure encapsulates the large number of flags and other fields which a caller may potentially set for the "git rev-list" scanning operation. The "lfs" package also has a similar ScanRefsOptions structure, which is used by the various Scan*() methods of the GitScanner structure to pass optional flags and fields to the internal functions which implement the scanning operations. However, unlike the "git" package, this ScanRefsOptions structure is not used when constructing a GitScanner, and is moreover never used outside of the "lfs" package. We can therefore remove this structure in favour of a more conventional set of internal elements encapsulated directly in the GitScanner structure. This simplifies the signatures of many of the internal scan*() functions and helps clarify what parts of the GitScanner structure actually need to be exported. As part of this change, we also create a new, dedicated and un-exported nameMap structure, which holds the map of Git object SHAs to their pathspecs, as populated during scan operations. It is then utilized in the existing lockableNameSet structure, whose Check() method is called by the runCatFileBatch() and runCatFileBatchCheck() functions to collect the pathspecs of files locked by other users. Note that the intent to remove the ScanRefsOptions structure from the "lfs" package was discussed in several PRs in 2016, and in particular, this commit's changes achieve some of the goals of the never-merged PR git-lfs#1595, but without introducing an entirely generic "scanner" package. See also: git-lfs#1670 (comment)
The RevListScanner structure of the "git" package is constructed by the NewRevListScanner() function, which takes a ScanRefsOptions structure as an argument. The git.ScanRefsOptions structure encapsulates the large number of flags and other fields which a caller may potentially set for the "git rev-list" scanning operation. The "lfs" package also has a similar ScanRefsOptions structure, which is used by the various Scan*() methods of the GitScanner structure to pass optional flags and fields to the internal functions which implement the scanning operations. However, unlike the "git" package, this ScanRefsOptions structure is not used when constructing a GitScanner, and is moreover never used outside of the "lfs" package. We can therefore remove this structure in favour of a more conventional set of internal elements encapsulated directly in the GitScanner structure. This simplifies the signatures of many of the internal scan*() functions and helps clarify what parts of the GitScanner structure actually need to be exported. As part of this change, we also create a new, dedicated and un-exported nameMap structure, which holds the map of Git object SHAs to their pathspecs, as populated during scan operations. It is then utilized in the existing lockableNameSet structure, whose Check() method is called by the runCatFileBatch() and runCatFileBatchCheck() functions to collect the pathspecs of files locked by other users. Note that the intent to remove the ScanRefsOptions structure from the "lfs" package was discussed in several PRs in 2016, and in particular, this commit's changes achieve some of the goals of the never-merged PR git-lfs#1595, but without introducing an entirely generic "scanner" package. See also: git-lfs#1670 (comment)
The GitScanner structure and its methods were introduced in PR git-lfs#1670, and in commit bdbca39 of that PR the structure's internal sync.Mutex element was added to protect access to the "remote" and "skippedRefs" elements, as well as the "closed" element. The structure's methods always acquire the mutex before reading or writing these elements, particularly the "remote" one. This type of design is typical of other structures where we expect concurrent access to the structure's data, such as when multiple goroutines attempt to read and write the same data. However, in practice the GitScanner structure's "remote" and "skippedRefs" elements are only ever initialized once, and indeed only initialized in one specific use case by the (*uploadContext).buildGitScanner() method for the "git lfs push" and "git lfs pre-push" commands. Meanwhile, the "closed" element is only used in the Close() method, and this method is only ever called once per structure by our existing code. Moreover, other elements of the GitScanner structure which have been added since its original introduction are set and retrieved without locking the mutex, such as the FoundLockable and PotentialLockables elements, which are checked and read directly in the scanRefsToChan() function of the "lfs" package. As it never the case that multiple goroutines need to read and write to the "remote", "skippedRefs", and "closed" internal elements of the GitScanner structure, we can remove the mutex and the associated code, which simplifies the setup of the structure.
The GitScanner structure and its methods were introduced in PR git-lfs#1670, and in commit bdbca39 of that PR the structure's Close() method was introduced. Unlike other similar structures whose Close() methods should be called to release underlying resources such as channels or I/O streams, the (*GitScanner).Close() method serves only to output an optional performance timing trace metric. This Close() method is not called consistently; for instance, it is never called by the migrateExportCommand() function of the "git lfs migrate" command, and will be skipped by the checkoutCommand() function of the "git lfs checkout" command if an error is returned by the (*GitScanner).ScanTree() method. The utility of the performance timing metric is also undercut by the fact that some commands perform other tasks before and after calling the specific (*GitScanner).Scan*() method they invoke. And in the particular case of the "git lfs prune" command, multiple goroutines are started, each of which runs a different Scan*() method simultaneously with the others, so the final timing metric does not account for their different execution times, just the overall final timing. We can improve the value of the timing metric while also simplifying the calling convention for the GitScanner structure's methods by removing the Close() method, and tracing the performance of each Scan*() method individually. Removing the Close() method clarifies that no underlying resources must be released for the GitScanner structure, and so callers need not try to register a deferred call to the method. This parallels some other conventional Go structures, such as the Scanner structure of the "bufio" package. As well, running a "git lfs prune" command with the GIT_TRACE_PERFORMANCE=1 environment variable set now results in more detailed and useful output, for example: 12:36:51.221526 performance ScanStashed: 0.013632533 s 12:36:51.224494 performance ScanUnpushed: 0.016570280 s 12:36:51.240670 performance ScanTree: 0.017171717 s
The RevListScanner structure of the "git" package is constructed by the NewRevListScanner() function, which takes a ScanRefsOptions structure as an argument. The git.ScanRefsOptions structure encapsulates the large number of flags and other fields which a caller may potentially set for the "git rev-list" scanning operation. The "lfs" package also has a similar ScanRefsOptions structure, which is used by the various Scan*() methods of the GitScanner structure to pass optional flags and fields to the internal functions which implement the scanning operations. However, unlike the "git" package, this ScanRefsOptions structure is not used when constructing a GitScanner, and is moreover never used outside of the "lfs" package. We can therefore remove this structure in favour of a more conventional set of internal elements encapsulated directly in the GitScanner structure. This simplifies the signatures of many of the internal scan*() functions and helps clarify what parts of the GitScanner structure actually need to be exported. As part of this change, we also create a new, dedicated and un-exported nameMap structure, which holds the map of Git object SHAs to their pathspecs, as populated during scan operations. It is then utilized in the existing lockableNameSet structure, whose Check() method is called by the runCatFileBatch() and runCatFileBatchCheck() functions to collect the pathspecs of files locked by other users. Note that the intent to remove the ScanRefsOptions structure from the "lfs" package was discussed in several PRs in 2016, and in particular, this commit's changes achieve some of the goals of the never-merged PR git-lfs#1595, but without introducing an entirely generic "scanner" package. See also: git-lfs#1670 (comment)
This PR builds on #1650 and introduces a
GitScannertype. It's pretty empty, but it illustrates where I want to go with #1649. As of this PR, nothing outside of thelfspackage useslfs.ScanRefs()orlfs.ScanRefsToChan().TODO
master