Skip to content

experimental gitscanner api#1670

Merged
technoweenie merged 25 commits intomasterfrom
gitscanner-exported-api
Nov 18, 2016
Merged

experimental gitscanner api#1670
technoweenie merged 25 commits intomasterfrom
gitscanner-exported-api

Conversation

@technoweenie
Copy link
Contributor

@technoweenie technoweenie commented Nov 16, 2016

This PR builds on #1650 and introduces a GitScanner type. It's pretty empty, but it illustrates where I want to go with #1649. As of this PR, nothing outside of the lfs package uses lfs.ScanRefs() or lfs.ScanRefsToChan().

TODO

@technoweenie technoweenie added this to the v2.0.0 milestone Nov 16, 2016
@ttaylorr
Copy link
Contributor

This looks good to me.

@technoweenie
Copy link
Contributor Author

I just pushed a change that stores the skipped refs on the *GitScanner. This means that initial git pushes with lots of refs will only calculate this data once, instead of on every lfs.ScanRefs() or (*GitScanner) ScanLeftToRemote() call.

@ttaylorr
Copy link
Contributor

I just pushed a change that stores the skipped refs on the *GitScanner.

Looks sweet. I'm a little bit worried about that list being potentially large, but not sure what other alternatives I'd go for.

@technoweenie technoweenie changed the base branch from catfilebatch-v2 to master November 16, 2016 23:50
@technoweenie
Copy link
Contributor Author

Okay, I think this is ready to go now. lfs/scanner.go has no more public functions. Everything is using the new gitscanner api. I'd like to merge this so I can continue on more focused refactoring.

@technoweenie
Copy link
Contributor Author

As a bonus, I exported the base channel wrapper stuff to tools (from #1625).

Copy link
Contributor

@ttaylorr ttaylorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge!

s.mu.Lock()
defer s.mu.Unlock()

if s.closed {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the mutex lock and unlock here by using a uint32 and either storing 0 or 1 by using the sync/atomic package.

return scanIndex(ref)
}

func (s *GitScanner) opts(mode ScanningMode) *ScanRefsOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a note here that this is a temporary measure at least until we can get rid of *ScanRefsOptions.

@technoweenie technoweenie merged commit 13018d7 into master Nov 18, 2016
@technoweenie technoweenie deleted the gitscanner-exported-api branch November 18, 2016 20:48
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 6, 2023
The GitScanner structure and its methods were introduced in PR git-lfs#1670,
and in commit bdbca39 of that PR
the structure's internal sync.Mutex element was added to protect access to
the "remote" and "skippedRefs" elements, as well as the "closed" element.
The structure's methods always acquire the mutex before reading or writing
these elements, particularly the "remote" one.

This type of design is typical of other structures where we expect
concurrent access to the structure's data, such as when multiple
goroutines attempt to read and write the same data.

However, in practice the GitScanner structure's "remote" and "skippedRefs"
elements are only ever initialized once, and indeed only initialized in
one specific use case by the (*uploadContext).buildGitScanner() method
for the "git lfs push" and "git lfs pre-push" commands.  Meanwhile,
the "closed" element is only used in the Close() method, and this
method is only ever called once per structure by our existing code.

Moreover, other elements of the GitScanner structure which have been
added since its original introduction are set and retrieved without
locking the mutex, such as the FoundLockable and PotentialLockables
elements, which are checked and read directly in the scanRefsToChan()
function of the "lfs" package.

As it never the case that multiple goroutines need to read and write
to the "remote", "skippedRefs", and "closed" internal elements of the
GitScanner structure, we can remove the mutex and the associated code,
which simplifies the setup of the structure.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 6, 2023
The GitScanner structure and its methods were introduced in PR git-lfs#1670,
and in commit bdbca39 of that PR
the structure's Close() method was introduced.  Unlike other similar
structures whose Close() methods should be called to release underlying
resources such as channels or I/O streams, the (*GitScanner).Close()
method serves only to output an optional performance timing trace metric.

This Close() method is not called consistently; for instance, it is never
called by the migrateExportCommand() function of the "git lfs migrate"
command, and will be skipped by the checkoutCommand() function of the
"git lfs checkout" command if an error is returned by the
(*GitScanner).ScanTree() method.

The utility of the performance timing metric is also undercut by the
fact that some commands perform other tasks before and after calling
the specific (*GitScanner).Scan*() method they invoke.  And in the
particular case of the "git lfs prune" command, multiple goroutines
are started, each of which runs a different Scan*() method simultaneously
with the others, so the final timing metric does not account for
their different execution times, just the overall final timing.

We can improve the value of the timing metric while also simplifying
the calling convention for the GitScanner structure's methods by
removing the Close() method, and tracing the performance of each
Scan*() method individually.

Removing the Close() method clarifies that no underlying resources
must be released for the GitScanner structure, and so callers need
not try to register a deferred call to the method.  This parallels
some other conventional Go structures, such as the Scanner structure
of the "bufio" package.

As well, running a "git lfs prune" command with the GIT_TRACE_PERFORMANCE=1
environment variable set now results in more detailed and useful output,
for example:

  12:36:51.221526 performance ScanStashed: 0.013632533 s
  12:36:51.224494 performance ScanUnpushed: 0.016570280 s
  12:36:51.240670 performance ScanTree: 0.017171717 s
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 6, 2023
The RevListScanner structure of the "git" package is constructed by
the NewRevListScanner() function, which takes a ScanRefsOptions
structure as an argument.  The git.ScanRefsOptions structure encapsulates
the large number of flags and other fields which a caller may potentially
set for the "git rev-list" scanning operation.

The "lfs" package also has a similar ScanRefsOptions structure, which
is used by the various Scan*() methods of the GitScanner structure to
pass optional flags and fields to the internal functions which implement
the scanning operations.  However, unlike the "git" package, this
ScanRefsOptions structure is not used when constructing a GitScanner,
and is moreover never used outside of the "lfs" package.

We can therefore remove this structure in favour of a more conventional
set of internal elements encapsulated directly in the GitScanner structure.
This simplifies the signatures of many of the internal scan*() functions
and helps clarify what parts of the GitScanner structure actually
need to be exported.

As part of this change, we also create a new, dedicated and un-exported
nameMap structure, which holds the map of Git object SHAs to their
pathspecs, as populated during scan operations.  It is then utilized
in the existing lockableNameSet structure, whose Check() method is
called by the runCatFileBatch() and runCatFileBatchCheck() functions
to collect the pathspecs of files locked by other users.

Note that the intent to remove the ScanRefsOptions structure from
the "lfs" package was discussed in several PRs in 2016, and in
particular, this commit's changes achieve some of the goals of the
never-merged PR git-lfs#1595, but without introducing an entirely generic
"scanner" package.  See also:

git-lfs#1670 (comment)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The GitScanner structure and its methods were introduced in PR git-lfs#1670,
and in commit bdbca39 of that PR
the structure's Close() method was introduced.  Unlike other similar
structures whose Close() methods should be called to release underlying
resources such as channels or I/O streams, the (*GitScanner).Close()
method serves only to output an optional performance timing trace metric.

This Close() method is not called consistently; for instance, it is never
called by the migrateExportCommand() function of the "git lfs migrate"
command, and will be skipped by the checkoutCommand() function of the
"git lfs checkout" command if an error is returned by the
(*GitScanner).ScanTree() method.

The utility of the performance timing metric is also undercut by the
fact that some commands perform other tasks before and after calling
the specific (*GitScanner).Scan*() method they invoke.  And in the
particular case of the "git lfs prune" command, multiple goroutines
are started, each of which runs a different Scan*() method simultaneously
with the others, so the final timing metric does not account for
their different execution times, just the overall final timing.

We can improve the value of the timing metric while also simplifying
the calling convention for the GitScanner structure's methods by
removing the Close() method, and tracing the performance of each
Scan*() method individually.

Removing the Close() method clarifies that no underlying resources
must be released for the GitScanner structure, and so callers need
not try to register a deferred call to the method.  This parallels
some other conventional Go structures, such as the Scanner structure
of the "bufio" package.

As well, running a "git lfs prune" command with the GIT_TRACE_PERFORMANCE=1
environment variable set now results in more detailed and useful output,
for example:

  12:36:51.221526 performance ScanStashed: 0.013632533 s
  12:36:51.224494 performance ScanUnpushed: 0.016570280 s
  12:36:51.240670 performance ScanTree: 0.017171717 s
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The RevListScanner structure of the "git" package is constructed by
the NewRevListScanner() function, which takes a ScanRefsOptions
structure as an argument.  The git.ScanRefsOptions structure encapsulates
the large number of flags and other fields which a caller may potentially
set for the "git rev-list" scanning operation.

The "lfs" package also has a similar ScanRefsOptions structure, which
is used by the various Scan*() methods of the GitScanner structure to
pass optional flags and fields to the internal functions which implement
the scanning operations.  However, unlike the "git" package, this
ScanRefsOptions structure is not used when constructing a GitScanner,
and is moreover never used outside of the "lfs" package.

We can therefore remove this structure in favour of a more conventional
set of internal elements encapsulated directly in the GitScanner structure.
This simplifies the signatures of many of the internal scan*() functions
and helps clarify what parts of the GitScanner structure actually
need to be exported.

As part of this change, we also create a new, dedicated and un-exported
nameMap structure, which holds the map of Git object SHAs to their
pathspecs, as populated during scan operations.  It is then utilized
in the existing lockableNameSet structure, whose Check() method is
called by the runCatFileBatch() and runCatFileBatchCheck() functions
to collect the pathspecs of files locked by other users.

Note that the intent to remove the ScanRefsOptions structure from
the "lfs" package was discussed in several PRs in 2016, and in
particular, this commit's changes achieve some of the goals of the
never-merged PR git-lfs#1595, but without introducing an entirely generic
"scanner" package.  See also:

git-lfs#1670 (comment)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The GitScanner structure and its methods were introduced in PR git-lfs#1670,
and in commit bdbca39 of that PR
the structure's internal sync.Mutex element was added to protect access to
the "remote" and "skippedRefs" elements, as well as the "closed" element.
The structure's methods always acquire the mutex before reading or writing
these elements, particularly the "remote" one.

This type of design is typical of other structures where we expect
concurrent access to the structure's data, such as when multiple
goroutines attempt to read and write the same data.

However, in practice the GitScanner structure's "remote" and "skippedRefs"
elements are only ever initialized once, and indeed only initialized in
one specific use case by the (*uploadContext).buildGitScanner() method
for the "git lfs push" and "git lfs pre-push" commands.  Meanwhile,
the "closed" element is only used in the Close() method, and this
method is only ever called once per structure by our existing code.

Moreover, other elements of the GitScanner structure which have been
added since its original introduction are set and retrieved without
locking the mutex, such as the FoundLockable and PotentialLockables
elements, which are checked and read directly in the scanRefsToChan()
function of the "lfs" package.

As it never the case that multiple goroutines need to read and write
to the "remote", "skippedRefs", and "closed" internal elements of the
GitScanner structure, we can remove the mutex and the associated code,
which simplifies the setup of the structure.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The GitScanner structure and its methods were introduced in PR git-lfs#1670,
and in commit bdbca39 of that PR
the structure's Close() method was introduced.  Unlike other similar
structures whose Close() methods should be called to release underlying
resources such as channels or I/O streams, the (*GitScanner).Close()
method serves only to output an optional performance timing trace metric.

This Close() method is not called consistently; for instance, it is never
called by the migrateExportCommand() function of the "git lfs migrate"
command, and will be skipped by the checkoutCommand() function of the
"git lfs checkout" command if an error is returned by the
(*GitScanner).ScanTree() method.

The utility of the performance timing metric is also undercut by the
fact that some commands perform other tasks before and after calling
the specific (*GitScanner).Scan*() method they invoke.  And in the
particular case of the "git lfs prune" command, multiple goroutines
are started, each of which runs a different Scan*() method simultaneously
with the others, so the final timing metric does not account for
their different execution times, just the overall final timing.

We can improve the value of the timing metric while also simplifying
the calling convention for the GitScanner structure's methods by
removing the Close() method, and tracing the performance of each
Scan*() method individually.

Removing the Close() method clarifies that no underlying resources
must be released for the GitScanner structure, and so callers need
not try to register a deferred call to the method.  This parallels
some other conventional Go structures, such as the Scanner structure
of the "bufio" package.

As well, running a "git lfs prune" command with the GIT_TRACE_PERFORMANCE=1
environment variable set now results in more detailed and useful output,
for example:

  12:36:51.221526 performance ScanStashed: 0.013632533 s
  12:36:51.224494 performance ScanUnpushed: 0.016570280 s
  12:36:51.240670 performance ScanTree: 0.017171717 s
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The RevListScanner structure of the "git" package is constructed by
the NewRevListScanner() function, which takes a ScanRefsOptions
structure as an argument.  The git.ScanRefsOptions structure encapsulates
the large number of flags and other fields which a caller may potentially
set for the "git rev-list" scanning operation.

The "lfs" package also has a similar ScanRefsOptions structure, which
is used by the various Scan*() methods of the GitScanner structure to
pass optional flags and fields to the internal functions which implement
the scanning operations.  However, unlike the "git" package, this
ScanRefsOptions structure is not used when constructing a GitScanner,
and is moreover never used outside of the "lfs" package.

We can therefore remove this structure in favour of a more conventional
set of internal elements encapsulated directly in the GitScanner structure.
This simplifies the signatures of many of the internal scan*() functions
and helps clarify what parts of the GitScanner structure actually
need to be exported.

As part of this change, we also create a new, dedicated and un-exported
nameMap structure, which holds the map of Git object SHAs to their
pathspecs, as populated during scan operations.  It is then utilized
in the existing lockableNameSet structure, whose Check() method is
called by the runCatFileBatch() and runCatFileBatchCheck() functions
to collect the pathspecs of files locked by other users.

Note that the intent to remove the ScanRefsOptions structure from
the "lfs" package was discussed in several PRs in 2016, and in
particular, this commit's changes achieve some of the goals of the
never-merged PR git-lfs#1595, but without introducing an entirely generic
"scanner" package.  See also:

git-lfs#1670 (comment)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The RevListScanner structure of the "git" package is constructed by
the NewRevListScanner() function, which takes a ScanRefsOptions
structure as an argument.  The git.ScanRefsOptions structure encapsulates
the large number of flags and other fields which a caller may potentially
set for the "git rev-list" scanning operation.

The "lfs" package also has a similar ScanRefsOptions structure, which
is used by the various Scan*() methods of the GitScanner structure to
pass optional flags and fields to the internal functions which implement
the scanning operations.  However, unlike the "git" package, this
ScanRefsOptions structure is not used when constructing a GitScanner,
and is moreover never used outside of the "lfs" package.

We can therefore remove this structure in favour of a more conventional
set of internal elements encapsulated directly in the GitScanner structure.
This simplifies the signatures of many of the internal scan*() functions
and helps clarify what parts of the GitScanner structure actually
need to be exported.

As part of this change, we also create a new, dedicated and un-exported
nameMap structure, which holds the map of Git object SHAs to their
pathspecs, as populated during scan operations.  It is then utilized
in the existing lockableNameSet structure, whose Check() method is
called by the runCatFileBatch() and runCatFileBatchCheck() functions
to collect the pathspecs of files locked by other users.

Note that the intent to remove the ScanRefsOptions structure from
the "lfs" package was discussed in several PRs in 2016, and in
particular, this commit's changes achieve some of the goals of the
never-merged PR git-lfs#1595, but without introducing an entirely generic
"scanner" package.  See also:

git-lfs#1670 (comment)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The GitScanner structure and its methods were introduced in PR git-lfs#1670,
and in commit bdbca39 of that PR
the structure's internal sync.Mutex element was added to protect access to
the "remote" and "skippedRefs" elements, as well as the "closed" element.
The structure's methods always acquire the mutex before reading or writing
these elements, particularly the "remote" one.

This type of design is typical of other structures where we expect
concurrent access to the structure's data, such as when multiple
goroutines attempt to read and write the same data.

However, in practice the GitScanner structure's "remote" and "skippedRefs"
elements are only ever initialized once, and indeed only initialized in
one specific use case by the (*uploadContext).buildGitScanner() method
for the "git lfs push" and "git lfs pre-push" commands.  Meanwhile,
the "closed" element is only used in the Close() method, and this
method is only ever called once per structure by our existing code.

Moreover, other elements of the GitScanner structure which have been
added since its original introduction are set and retrieved without
locking the mutex, such as the FoundLockable and PotentialLockables
elements, which are checked and read directly in the scanRefsToChan()
function of the "lfs" package.

As it never the case that multiple goroutines need to read and write
to the "remote", "skippedRefs", and "closed" internal elements of the
GitScanner structure, we can remove the mutex and the associated code,
which simplifies the setup of the structure.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The GitScanner structure and its methods were introduced in PR git-lfs#1670,
and in commit bdbca39 of that PR
the structure's Close() method was introduced.  Unlike other similar
structures whose Close() methods should be called to release underlying
resources such as channels or I/O streams, the (*GitScanner).Close()
method serves only to output an optional performance timing trace metric.

This Close() method is not called consistently; for instance, it is never
called by the migrateExportCommand() function of the "git lfs migrate"
command, and will be skipped by the checkoutCommand() function of the
"git lfs checkout" command if an error is returned by the
(*GitScanner).ScanTree() method.

The utility of the performance timing metric is also undercut by the
fact that some commands perform other tasks before and after calling
the specific (*GitScanner).Scan*() method they invoke.  And in the
particular case of the "git lfs prune" command, multiple goroutines
are started, each of which runs a different Scan*() method simultaneously
with the others, so the final timing metric does not account for
their different execution times, just the overall final timing.

We can improve the value of the timing metric while also simplifying
the calling convention for the GitScanner structure's methods by
removing the Close() method, and tracing the performance of each
Scan*() method individually.

Removing the Close() method clarifies that no underlying resources
must be released for the GitScanner structure, and so callers need
not try to register a deferred call to the method.  This parallels
some other conventional Go structures, such as the Scanner structure
of the "bufio" package.

As well, running a "git lfs prune" command with the GIT_TRACE_PERFORMANCE=1
environment variable set now results in more detailed and useful output,
for example:

  12:36:51.221526 performance ScanStashed: 0.013632533 s
  12:36:51.224494 performance ScanUnpushed: 0.016570280 s
  12:36:51.240670 performance ScanTree: 0.017171717 s
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The RevListScanner structure of the "git" package is constructed by
the NewRevListScanner() function, which takes a ScanRefsOptions
structure as an argument.  The git.ScanRefsOptions structure encapsulates
the large number of flags and other fields which a caller may potentially
set for the "git rev-list" scanning operation.

The "lfs" package also has a similar ScanRefsOptions structure, which
is used by the various Scan*() methods of the GitScanner structure to
pass optional flags and fields to the internal functions which implement
the scanning operations.  However, unlike the "git" package, this
ScanRefsOptions structure is not used when constructing a GitScanner,
and is moreover never used outside of the "lfs" package.

We can therefore remove this structure in favour of a more conventional
set of internal elements encapsulated directly in the GitScanner structure.
This simplifies the signatures of many of the internal scan*() functions
and helps clarify what parts of the GitScanner structure actually
need to be exported.

As part of this change, we also create a new, dedicated and un-exported
nameMap structure, which holds the map of Git object SHAs to their
pathspecs, as populated during scan operations.  It is then utilized
in the existing lockableNameSet structure, whose Check() method is
called by the runCatFileBatch() and runCatFileBatchCheck() functions
to collect the pathspecs of files locked by other users.

Note that the intent to remove the ScanRefsOptions structure from
the "lfs" package was discussed in several PRs in 2016, and in
particular, this commit's changes achieve some of the goals of the
never-merged PR git-lfs#1595, but without introducing an entirely generic
"scanner" package.  See also:

git-lfs#1670 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants