Skip to content

commands: add git-lfs-migrate(1) 'info' subcommand#2313

Merged
ttaylorr merged 42 commits intomasterfrom
migrate-subcommand-info
Jun 16, 2017
Merged

commands: add git-lfs-migrate(1) 'info' subcommand#2313
ttaylorr merged 42 commits intomasterfrom
migrate-subcommand-info

Conversation

@ttaylorr
Copy link
Contributor

This pull request adds documentation and an implementation for the 'info' subcommand of the git-lfs-migrate(1) suite.

The 'info' subcommand is designed to gather statistics regarding the size of data stored in Git per filepath extension. The output of the command looks as follows:

~/g/git-lfs (migrate-subcommand-info) $ git-lfs migrate info \
  --include-ref=refs/heads/migrate-subcommand-info \
  --exclude-ref=refs/heads/master \
  --top=3
*.go 2.0 MB
*.sh 325.3 KB
*.md 225.6 KB

The following arguments are permitted:

  • --include-ref: a set of references of which commits reachable by those refs are included in the traversal.
  • --exclude-ref: a set of references of which commits reachable by those refs are excluded in the traversal.
  • --include: a filepath filter include set
  • --exclude: a filepath filter exclude set
  • <branch ...>: an optional set of branches to include, defaulting to the currently checked out branch, or those references given in --{include,exclude}-ref.

Here's a summary of the changes:

  • fd837f4...11b8d1f: document and specify command arguments in the man pages
  • b47255d: add scaffolding for subcommands to git-lfs-migrate(1).
  • 58fda2d...1a67cec: helper functions necessary to complete a migration.
  • d2c6b0c: teach tools.ClampInt.
  • 1612c44: add test fixtures for the info subcommand
  • af3871b: implement and test the info subcommand

This is the first command to be implemented as described in the git-lfs-migrate(1) proposal issue: (see discussion: #2146).


/cc @git-lfs/core

@ttaylorr ttaylorr added this to the v2.2.0 milestone Jun 10, 2017
@ttaylorr ttaylorr requested a review from technoweenie June 10, 2017 01:15
Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, but I have some minor review comments.

One thing: I don't think showing cumulative file size for a file type is that helpful. How can the info command help you decide what to migrate?

  • Allow user to pass in some kind of arbitrary size threshold. git lfs migrate info --size=5MB
  • Show some kind of histogram of file sizes. Median, 75/95/99 percentile?

// `git-lfs-migrate(1)` command as a subcommand (child).

for _, subcommand := range []*cobra.Command{
MigrateInfoCommand,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to declare this MigrateInfoCommand var in commands/command_migrate_info.go if some of the flags are added here. They should all be set up in this init() func. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a great idea; it should make readers more easily aware of the various migrate subcommands. I made this change via: 1f443d0.

## MODES

* `info`
Perform no migration, instead show information about repository size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you trim the first part of this sentence. Talk about what it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I much prefer your wording: d453d9d.


* [branch ...]:
Migrate only the set of branches listed. If not given, `git-lfs-migrate(1)`
will the currently checked out branch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're missing a word here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅: 471fc9e.

@ttaylorr
Copy link
Contributor Author

@technoweenie good call re: surfacing more relevant information. Here's an updated format I implemented in f3e0af6. What do you think?

~/g/git-lfs (migrate-subcommand-info) $ ./bin/git-lfs migrate info \
  --top=5 --above=10kb \
  --include-ref=refs/heads/migrate-subcommand-info
  --exclude-ref=refs/heads/master
Files above 10 KB:
*.go    906 KB, 48/479 files(s)  10%
*.bmp      154 KB, 1/2 files(s)  50%
*.sh      149 KB, 8/75 files(s)  11%
*.md      114 KB, 7/45 files(s)  16%
*.ico       34 KB, 1/1 files(s) 100%

Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really great. A few comments:

}

// TODO(@ttaylorr): needed?
exts[ext] = entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's either needed here, or in the entry == nil block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either works, but I dig the explicitness of exts[ext] = entry. Removed the TODO comment in 6b94d8b.

}

// MigrateInfoEntry represents a tuple of filetype to total size taken by that
// file type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this description is a bit out of date, since the file type to entry mapping is the exts var.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Updated those docs and added some godoc to the fields on this type in 7c77064.

percentage := fmt.Sprintf("%.0f%%", percentAbove)

percentages = append(percentages, percentage)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor things about this:

First, it's given an io.Writer, but the output is buffered. It's not exactly a lot of data, but I'm just used to writing it out with fmt.Fprintf if the output doesn't need post-processing. However, having to deal with just 1 err from fmt.Fprintln is kinda nice.

Second: You could reduce how often e is looped by building extensions, files, and percentages at the same time. At a high level, the structure of this func would be something like:

  1. Init extensions, files, percentages
  2. Loop through e, appending to those slices
  3. Rjust or Ljust them
  4. Write header
  5. Write each line with the justified extensions, files, percentages

I think structurally it makes more sense, but 36179de saves a whopping 3 LOC (7 if I didn't add comments).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, having to deal with just 1 err from fmt.Fprintln is kinda nice.

Totally agree.

Second: You could reduce how often e is looped by building extensions, files, and percentages at the same time.

Thank you! This is such an awesome suggestion. I changed this in 0dbc358 and it made the code much cleaner.

@technoweenie
Copy link
Contributor

technoweenie commented Jun 13, 2017

I think this output looks great, but the file sizes should be aligned too:

Files above 10 KB:
*.go    906 KB, 48/479 files(s)  10%
*.bmp      154 KB, 1/2 files(s)  50%
*.sh      149 KB, 8/75 files(s)  11%
*.md      114 KB, 7/45 files(s)  16%
*.ico       34 KB, 1/1 files(s) 100%

Aligned, it'd look something like:

*.go 	906 KB	48/480 files(s)	 10%
*.bmp	154 KB	   1/2 files(s)	 50%
*.sh 	149 KB	  8/75 files(s)	 11%
*.md 	114 KB	  7/45 files(s)	 16%
*.ico	34 KB 	   1/1 files(s)	100%

In chat, you questioned out loud about how to parse this output with scripts:

  • I know I suggested moving it to a header, but maybe Files above 10 KB: is not necessary? I don't think I need the command to repeat the --above flag to me.
  • While humanized sizes are definitely the way to go by default, IMO, what about introducing a flag to specify the unit? --size-unit=[ bytes | M | etc ].

migrateInfoAboveFmt string
// migrateInfoAbove is the number of bytes parsed from the above
// migrateInfoAboveFmt flag.
migrateInfoAbove uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think migrateInfoAbove should have a sensible default, like 1MB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this output looks great, but the file sizes should be aligned too:

Good call: I built up a size array as well, and aligned it in 7d2dbd6. I think it looks much better this way!

@ttaylorr
Copy link
Contributor Author

I know I suggested moving it to a header, but maybe Files above 10 KB: is not necessary? I don't think I need the command to repeat the --above flag to me.

I agree, and I really dig this reasoning. I think it a) makes the output easier to look at, b) doesn't repeat already known information, and c) makes the output easier to parse with Unix-y tools. Removed the header in ae76e72.

  • While humanized sizes are definitely the way to go, IMO, what about introducing a flag to specify the unit? --size-unit=[ bytes | M | etc ].

Totally possible, and I think this would make sense for a lot of command-line tools that want to parse the output.

@ttaylorr
Copy link
Contributor Author

I think migrateInfoAbove should have a sensible default, like 1MB.

Good call, I made the default 1MB in 97b0f36. This can be easily changed in the future if 1MB is too high/low.

  • While humanized sizes are definitely the way to go, IMO, what about introducing a flag to specify the unit? --size-unit=[ bytes | M | etc ].

Totally possible, and I think this would make sense for a lot of command-line tools that want to parse the output.

Started looking into this, but I think this may be best done in a separate PR after this one, since it will involve changes in both the humanize package and the info command.

@ttaylorr
Copy link
Contributor Author

@technoweenie so long as I follow up on the --size-unit flag in a separate PR, I think this is ready to go.

Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 🤘

@ttaylorr ttaylorr merged commit 56871cf into master Jun 16, 2017
@ttaylorr
Copy link
Contributor Author

Thanks for the review!

@ttaylorr ttaylorr deleted the migrate-subcommand-info branch June 19, 2017 15:53
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Mar 11, 2021
The ClampInt() function does not always return a value
between its "min" and "max" arguments because it reverses
their meaning, and the relevant tests in the corresponding
tools/math_test.go file do not report the problem because
do not run when the "go test ./tools" command is run.  The
latter bug is a result of the test functions lacking the
necessary Test* prefix to be recognized by the Go "test"
command.

After we fix the ClampInt() function, we rename all the
test functions in tools/math_test.go to follow the Test*()
naming scheme, which ensures they now run properly.

Lastly, we change the one caller of the ClampInt() function,
the migrateInfoCommand() function, which previously worked
correctly because it also reversed the order of the
arguments it passed to the function.  That reversal was
introduced in commit bd2e1b2
in PR git-lfs#2313, as part of the development of the "migrate info"
command, presumably because the author discovered the call
was not working as expected.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Mar 11, 2021
The ClampInt() function does not always return a value
between its "min" and "max" arguments because it reverses
their meaning, and the relevant tests in the corresponding
tools/math_test.go file do not report the problem because
they do not execute when the "go test ./tools" command is run.
The latter bug is a result of the test functions lacking the
necessary Test* prefix to be recognized by the Go "test"
command.

After we fix the ClampInt() function, we rename all the
test functions in tools/math_test.go to follow the Test*()
naming scheme, which ensures they now run properly.

Lastly, we change the one caller of the ClampInt() function,
the migrateInfoCommand() function, which previously worked
correctly because it also reversed the order of the
arguments it passed to the function.  That reversal was
introduced in commit bd2e1b2
in PR git-lfs#2313, as part of the development of the "migrate info"
command, presumably because the author discovered the call
was not working as expected.
pcal43 pushed a commit to pcal43/git-lfs-hack that referenced this pull request Jul 22, 2021
The ClampInt() function does not always return a value
between its "min" and "max" arguments because it reverses
their meaning, and the relevant tests in the corresponding
tools/math_test.go file do not report the problem because
they do not execute when the "go test ./tools" command is run.
The latter bug is a result of the test functions lacking the
necessary Test* prefix to be recognized by the Go "test"
command.

After we fix the ClampInt() function, we rename all the
test functions in tools/math_test.go to follow the Test*()
naming scheme, which ensures they now run properly.

Lastly, we change the one caller of the ClampInt() function,
the migrateInfoCommand() function, which previously worked
correctly because it also reversed the order of the
arguments it passed to the function.  That reversal was
introduced in commit bd2e1b2
in PR git-lfs#2313, as part of the development of the "migrate info"
command, presumably because the author discovered the call
was not working as expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants