Skip to content

Improve stats#3733

Merged
MichaelEischer merged 5 commits intomasterfrom
improve-stats
Jul 2, 2022
Merged

Improve stats#3733
MichaelEischer merged 5 commits intomasterfrom
improve-stats

Conversation

@fd0
Copy link
Copy Markdown
Member

@fd0 fd0 commented May 1, 2022

What does this PR change? What problem does it solve?

Print the number of bytes added to the repo (including optional compression and the crypto overhead).

Checklist

  • Add real size to the stats command
  • I have read the contribution guidelines.
  • I have enabled maintainer edits.
  • I have added tests for all code changes.
  • I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I have run gofmt on the code in all commits.
  • All commit messages are formatted in the same style as the other commits in the repo.
  • I'm done! This pull request is ready for review.

@fd0 fd0 force-pushed the improve-stats branch from 0d230d7 to 1419002 Compare May 1, 2022 13:14
@MichaelEischer
Copy link
Copy Markdown
Member

The code looks fine. But currently "SizeInRepo" feels a bit misleading. It will systematically underreport how much data was added to the repository, as the pack header overhead is missing. My suggestion would be to either immediately report the pack entry header size along with the blob length or to report that overhead together with the blob that triggers a pack file upload. The downside of the second variant is that it would either require modifications to repo.Flush() or not account for the last few pack headers.

@fd0
Copy link
Copy Markdown
Member Author

fd0 commented May 29, 2022

Ah, right. I'm in favor of the second option, even if that's not exact and will not report the last few header sizes. Since the header size will depend on the repo version I think it's less complex this way.

@fd0 fd0 force-pushed the improve-stats branch from 1419002 to 10d3df2 Compare May 29, 2022 14:22
@MichaelEischer MichaelEischer force-pushed the improve-stats branch 2 times, most recently from e8590ec to 8980f97 Compare June 4, 2022 22:32
@MichaelEischer MichaelEischer linked an issue Jun 4, 2022 that may be closed by this pull request
@MichaelEischer MichaelEischer force-pushed the improve-stats branch 2 times, most recently from 7d00f61 to f46a441 Compare June 5, 2022 09:54
@MichaelEischer
Copy link
Copy Markdown
Member

I've implemented a mix of both variants. We already have a method to calculate the size of the pack header entry for a given blob, which make accounting for the per blob overhead a one-line change. When finalizing a pack file, now the pack header crypto overhead is reported and added for the last blob.

Adding the pack header entry overhead for each blob individually also has the benefit that the "stored size" for each file won't have large random spikes due to charging the full pack overhead to a single blob.

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's merge this now and wait for feedback whether the stats need further improvements.

fd0 and others added 5 commits July 2, 2022 18:55
This includes optional compression and crypto overhead.
raw-data summed up the size of the blob plaintexts. However, with
compression this makes little sense as the storage size in the
repository is lower due to compression. Thus sum up the actual size each
blob takes in the repository.
This will miss the pack header crypto overhead and the length field,
which only amount to a few bytes per pack file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Total Size in stats is not correct when backup is compressed

2 participants