Skip to content

progress: use git/githistory/log package for formatting#2732

Merged
ttaylorr merged 14 commits intomasterfrom
progress-log
Nov 21, 2017
Merged

progress: use git/githistory/log package for formatting#2732
ttaylorr merged 14 commits intomasterfrom
progress-log

Conversation

@ttaylorr
Copy link
Contributor

This pull request extends the progress.Meter interface to implement git/githistory/log.Task so that the progress.Meter can be driven from a *git/githistory/log.Logger.

Being able to use the various implementations of progress.Meter within the git/githistory/log sub-package has the following benefits:

  • Separating the concept of generating a message and printing it.
  • Eliminating IO-races, where two things want to log at the same time and the message is garbled, or out of order.
  • Consistent use of the git/githistory/log package, which makes changing IO behavior in this context easier.

Since it's been awhile for me, I wanted to write down some of the ideas in the git/githistory/log package here:

  1. The *Logger logs Tasks, which provide a channel of updates, and a Throttled() method, telling the logger whether or not they should be throttled.
  2. The *Logger logs a Task until completion, enqueuing and subsequently printing messages from other Tasks only after the currently running Task is complete. It buffers further tasks in a pseudo-infinite buffer.

This pull request seeks to achieve the above benefits by implementing log.Task in progress.Meter, making it possible to log.NewLogger(...).Enqueue(progress.NewMeter()). The approach towards doing this was done in a minimal fashion, keeping the majority of the formatting code as-is, but adding a new update() method, which sends a new update after each time a method that would change the output of the progress meter is called.

For instance:

func (p *ProgressMeter) Add(size int64) {
	defer p.update()
 	atomic.AddInt32(&p.estimatedFiles, 1)
 	atomic.AddInt64(&p.estimatedBytes, size)
}

will modify the *ProgressMeter's state, then issue a deferred p.update() call, which will construct a new update, and send it. The *log.Logger will either accept it, or drop it on the floor if the *ProgressMeter is generating too many updates.

This had some other unexpected benefits like removing the for _ = range time.NewTicker(...).C idiom, and simplifying the Start() and Pause() implementations.

/cc @git-lfs/core @technoweenie

@ttaylorr ttaylorr added this to the v2.4.0 milestone Nov 16, 2017
return a
}
return b
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why'd you pull this out? Is there a circular dependency now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question, this happened when importing the git/githistory/log package from progress, since both of those packages import tools. This is the only reference to the tools package inside of git/githistory/log, so this was sufficient to remove the cyclic dependency.

Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a minor comment, but this otherwise looks good. I wrote down some thoughts about further changes to the LFS output to explore in #2743.

// possibly be transferred. If a file doesn't need to be transferred for some
// reason, be sure to call Skip(int64) with the same size.
func (p *ProgressMeter) Add(size int64) {
defer p.update()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of using defer here? Couldn't you just put p.update() at the end of the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any semantic benefit to having this be a deferred function call, but I like that the defer keyword allows us to put the p.update() call near the top of the method as a sort of way to annotate "hey, this is going to produce some updates that will change the output of this on the terminal". Deferred calls are pretty fast nowadays (and this code is running in the context of uploading/downloading large files over the network, so I think that the performance concern would be negligible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just leave the defer off then and keep it at the top? It looks like the only reason it'd hang is if the channel is full, in which case it's okay to delay the counter update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When update() is deferred, it reflects an update that includes the changes made by calling Add(), Skip(), etc. I think that it would be fine to run it before, we would just get something that says Git LFS: 0 of 0 files ..., and would have to update() once more when calling Finish().

I don't feel strongly either way, so am happy to change it either way!

if runtime.GOOS == "windows" {
// Windows console sucks, can't do nice check mark except in ConEmu (not cmd or git bash)
// So play it safe & boring
// Windows' console can't render UTF-8 check marks outside of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, thanks for replacing this with a more positive comment ;)

@ttaylorr ttaylorr merged commit 34395bb into master Nov 21, 2017
@ttaylorr ttaylorr deleted the progress-log branch November 21, 2017 00:08
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request May 26, 2023
In commit f100293 of PR git-lfs#2732 a local
maxInt() function was added to the "git/githistory/log" package in order
to avoid a circular dependancy on the "tools" package, which provides
a common MaxInt() implementation.  See, for reference, the discussion
in this PR comment:

git-lfs@f100293#r151551639

The "git/githistory/log" package was subsequently renamed to
the "tasklog" package in commits b9ab79e
and 45c580e of PR git-lfs#2747.

At present, though, there is no cyclic dependency which prevents
the use of the standard tools.MaxInt() function in the "tasklog"
package, so we revert to its use now.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request May 26, 2023
In commit e90d54d of PR git-lfs#2329 the
"git/githistory/log" package was added, including the PercentageTask
structure and associated methods, but unlike the WaitingTask which
was added at the same time, no Complete() method was defined for
the PercentageTask structure.

(Note that the "git/githistory/log" package was later renamed to the
"tasklog" package in commits b9ab79e
and 45c580e of PR git-lfs#2747.)

Other task types such as the ListTask and SimpleTask structures
(introduced in commit 31ffeb9 of
PR git-lfs#2335 and commit 7a760b6 of PR git-lfs#2756,
respectively) provide a Complete() method, and the Meter task type
of the "tq" package, which implemented the Task interface in commit
7c0f9e2 of PR git-lfs#2732, provides an
equivalent Finish() method.  These methods allow the caller to
explicitly close the channel returned by the Updates() method,
on which the anonymous goroutine started by the Logger.consume() method
for the task is waiting, in a "range" loop on the channel in the
Logger.logTask() method.

One key use of the PercentageTask structure is in the methods of
the Rewriter structure from the "git/githistory" package.  It is
initialized with the number of commits to be rewritten during
a "git lfs migrate" command's traversal of a Git repository's history.
As each commit is rewritten, the Count() method is called to
increment the percentage completed value.  When every commit has
been rewritten, the count reaches the expected total, and the
Count() method closes the channel, thus allowing the task receiving
updates to finish and exit its goroutine.

Under exceptional circumstances, though, the Rewrite() method of
the Rewriter structure may never finish iterating through all
the expected commits, and return in error or via a panic call.
When this happens, the goroutine waiting on the PercentageTask's
updates channel can never exit, leading to a hung process which
never fully exits.

In order to allow the Rewriter's Rewrite() method to define a
deferred function which will always be called when it returns,
under any circumstances, we add a Complete() method for the
PercentageTask structure which sets the number of completed elements
to the expected total in an atomic swap, and then closes the
updates channel if the number of completed elements was previously
less than the expected total.

We also add a test to validate this new method's behaviour, and
update the existing TestPercentageTaskCallsDoneWhenComplete()
function to also confirm that calling the new Complete() method
after the number of completed elements has reached the expected
total does not cause a second attempt to close the updates channel.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request May 26, 2023
In commit f100293 of PR git-lfs#2732 a local
maxInt() function was added to the "git/githistory/log" package in order
to avoid a circular dependancy on the "tools" package, which provides
a common MaxInt() implementation.  See, for reference, the discussion
in this PR comment:

git-lfs@f100293#r151551639

The "git/githistory/log" package was subsequently renamed to
the "tasklog" package in commits b9ab79e
and 45c580e of PR git-lfs#2747.

At present, though, there is no cyclic dependency which prevents
the use of the standard tools.MaxInt() function in the "tasklog"
package, so we revert to its use now.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request May 26, 2023
In commit e90d54d of PR git-lfs#2329 the
"git/githistory/log" package was added, including the PercentageTask
structure and associated methods, but unlike the WaitingTask which
was added at the same time, no Complete() method was defined for
the PercentageTask structure.

(Note that the "git/githistory/log" package was later renamed to the
"tasklog" package in commits b9ab79e
and 45c580e of PR git-lfs#2747.)

Other task types such as the ListTask and SimpleTask structures
(introduced in commit 31ffeb9 of
PR git-lfs#2335 and commit 7a760b6 of PR git-lfs#2756,
respectively) provide a Complete() method, and the Meter task type
of the "tq" package, which implemented the Task interface in commit
7c0f9e2 of PR git-lfs#2732, provides an
equivalent Finish() method.  These methods allow the caller to
explicitly close the channel returned by the Updates() method,
on which the anonymous goroutine started by the Logger.consume() method
for the task is waiting, in a "range" loop on the channel in the
Logger.logTask() method.

One key use of the PercentageTask structure is in the methods of
the Rewriter structure from the "git/githistory" package.  It is
initialized with the number of commits to be rewritten during
a "git lfs migrate" command's traversal of a Git repository's history.
As each commit is rewritten, the Count() method is called to
increment the percentage completed value.  When every commit has
been rewritten, the count reaches the expected total, and the
Count() method closes the channel, thus allowing the task receiving
updates to finish and exit its goroutine.

Under exceptional circumstances, though, the Rewrite() method of
the Rewriter structure may never finish iterating through all
the expected commits, and return in error or via a panic call.
When this happens, the goroutine waiting on the PercentageTask's
updates channel can never exit, leading to a hung process which
never fully exits.

In order to allow the Rewriter's Rewrite() method to define a
deferred function which will always be called when it returns,
under any circumstances, we add a Complete() method for the
PercentageTask structure which sets the number of completed elements
to the expected total in an atomic swap, and then closes the
updates channel if the number of completed elements was previously
less than the expected total.

We also add a test to validate this new method's behaviour, and
update the existing TestPercentageTaskCallsDoneWhenComplete()
function to also confirm that calling the new Complete() method
after the number of completed elements has reached the expected
total does not cause a second attempt to close the updates channel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants