Skip to content

Conversation

@PrasadG193
Copy link
Contributor

@PrasadG193 PrasadG193 commented Aug 28, 2024

Overview

This PR adds support to configure metadata compressor for k and x prefixed metadata.

This PR:

  • Adds metadata compression setting to policy
  • Set zstd-fastest as the default compressor for metadata in the policy
  • Adds support to set and show metadata compression to kopia policy commands
  • Adds metadata compression config to content WriterOptions

Addresses: #4081

Test plan

  1. Initialize repo and check default policy setting. Validate default metadata compression is zstd-fastest
$ kopia policy show --global
.
.
Compression disabled.

Metadata compression:
  Compressor:                   zstd-fastest   (defined for this target)
.
.
  1. Create a file with random data with 500M size to ./repo dir
  2. Perform snapshot of repo dir and observe stats. Validate content with k and x prefix is compressed with zstd-fastest
$ kopia content stats                                
Count: 356
Total Bytes: 501.3 MB
Total Packed: 501.2 MB (compression 0.0%)
By Method:
  (uncompressed)         count: 323 size: 501.2 MB
  zstd-fastest           count: 33 size: 58 KB packed: 20.8 KB compression: 64.1%
Average: 1.4 MB
Histogram:

        0 between 0 B and 10 B (total 0 B)
        0 between 10 B and 100 B (total 0 B)
       75 between 100 B and 1 KB (total 42.1 KB)
      160 between 1 KB and 10 KB (total 577.3 KB)
       30 between 10 KB and 100 KB (total 615.1 KB)
        0 between 100 KB and 1 MB (total 0 B)
       91 between 1 MB and 10 MB (total 500 MB)
        0 between 10 MB and 100 MB (total 0 B)


$ kopia content list --compression  | grep -E '^(x|k)'
k007863a722dae6e1f74bfa28621e9db7 length 497 packed 311 zstd-fastest 37.4% 
k038aab6d0bdd41c7a6c97747cb603dc8 length 648 packed 346 zstd-fastest 46.6% 
k0ab4f42158d52c06c94ad60d27a1c5d3 length 1660 packed 552 zstd-fastest 66.7% 
k16857bc153b551faa4f80992b6565a41 length 826 packed 399 zstd-fastest 51.7% 
k18c0fae9ac6dd39bcdfac44c1f75a728 length 658 packed 352 zstd-fastest 46.5% 
k2b454dbe5c39a09079aeae2a1af1223d length 1135 packed 474 zstd-fastest 58.2% 
k2b5468c5e9b8cd76b5f1f05967d2ee2d length 4305 packed 1205 zstd-fastest 72.0% 
k33ad0bd921cd1c2ab2eff1ad9ecc66ba length 820 packed 388 zstd-fastest 52.7% 
k3769ca93b5b6126f9fbb5490d7c13b44 length 2628 packed 769 zstd-fastest 70.7% 
k3d382629cd835df41d9cd2bc5c2a3afa length 495 packed 311 zstd-fastest 37.2% 
k4a2c0e6cf92774249ded4b7d65c86c8b length 1028 packed 453 zstd-fastest 55.9% 
k61aad0bf9232c1a9cbfc9dbafef21d80 length 818 packed 390 zstd-fastest 52.3% 
k6a4287d708e731165f9b70e1ebb4955a length 1157 packed 465 zstd-fastest 59.8% 
k6a9ddd587be38b8ed9d7603ca842f28a length 970 packed 421 zstd-fastest 56.6% 
k730a3c700d5e94e30e44e0db56e66855 length 660 packed 352 zstd-fastest 46.7% 
k84972793e968f8684f128ce18e6b6f4e length 2441 packed 782 zstd-fastest 68.0% 
k8ac1677ddcb906ea982db836d8ee0f5a length 1327 packed 506 zstd-fastest 61.9% 
k9dbee50c252d104acea29941e67c4c31 length 323 packed 267 zstd-fastest 17.3% 
kbb2aba0e8ca2484328101d9b476da99d length 3589 packed 1045 zstd-fastest 70.9% 
kbb4c4001dd8bae0a1c622baa148f35cb length 481 packed 304 zstd-fastest 36.8% 
kc2d5ee4763679bd415c69d4def08ec96 length 1315 packed 512 zstd-fastest 61.1% 
kc524ff99fcb9c39dd254bde1a893d83d length 1603 packed 592 zstd-fastest 63.1% 
kcb2f20aa3ef7c9515994512cd0ad5f4c length 318 packed 256 zstd-fastest 19.5% 
kcb933f9724207f5a8e919cc8febefe1f length 5556 packed 1330 zstd-fastest 76.1% 
kcbcf24b8a0bc543902cb98ac17389bd2 length 1200 packed 468 zstd-fastest 61.0% 
kce826339bef385f532414fb446f48a5d length 1290 packed 503 zstd-fastest 61.0% 
kd45ed696c4bb7aa89bb1bb7368439aa8 length 7102 packed 1861 zstd-fastest 73.8% 
ke315551eae3378d0ef8af200b13c18f4 length 656 packed 351 zstd-fastest 46.5% 
ke42c984de4242697ecdfd79dd8ca6d3c length 3030 packed 922 zstd-fastest 69.6% 
ke466ceaabc5b3e2cc4605fa9802f6a1a length 1515 packed 545 zstd-fastest 64.0% 
kf300eb87fa333e08274f458d418721af length 1313 packed 532 zstd-fastest 59.5% 
kfffe9471fbbf3fd85f2d6ef0d745b03d length 496 packed 310 zstd-fastest 37.5% 
xbf563e1b71bd6786c61b42fd4c781eba length 6106 packed 2539 zstd-fastest 58.4% 
  1. Set metadata compression of global policy to s2-default
$ kopia kopia policy set --global --metadata-compression=s2-default
Setting policy for (global)
 - setting metadata compression algorithm to s2-default


$ kopia policy show --global 
.
.
Compression disabled.

Metadata compression:
  Compressor:                     s2-default   (defined for this target)
.
.
  1. Create a new file with 500M random data to ./internal dir. Snapshot /internal directory and view content stats. Validate new metadata is compressed with s2-default
$ kopia content stats                                        
Count: 796
Total Bytes: 1 GB
Total Packed: 1 GB (compression 0.0%)
By Method:
  (uncompressed)         count: 691 size: 1 GB
  zstd-fastest           count: 33 size: 58 KB packed: 20.8 KB compression: 64.1%
  s2-default             count: 72 size: 81.1 KB packed: 47 KB compression: 42.0%
Average: 1.3 MB
Histogram:

        0 between 0 B and 10 B (total 0 B)
        0 between 10 B and 100 B (total 0 B)
      205 between 100 B and 1 KB (total 107.6 KB)
      364 between 1 KB and 10 KB (total 1.2 MB)
       47 between 10 KB and 100 KB (total 898 KB)
        1 between 100 KB and 1 MB (total 102.2 KB)
      179 between 1 MB and 10 MB (total 1 GB)
        0 between 10 MB and 100 MB (total 0 B)

$ kopia content list --compression  | grep -E '^(x|k)'
k007863a722dae6e1f74bfa28621e9db7 length 497 packed 311 zstd-fastest 37.4%   
k038aab6d0bdd41c7a6c97747cb603dc8 length 648 packed 346 zstd-fastest 46.6%  
k05047e1dc21a4d4d16c07fb53c18a021 length 317 packed 327 s2-default 0%      
k052746f226d06896b9df7632c504067e length 489 packed 400 s2-default 18.2%    
k0a3b9dd26b35baa0565067d2d5eb81fd length 474 packed 399 s2-default 15.8%     
k0a5adac9c5bfa7cb3b26b36a4e0305b8 length 1686 packed 850 s2-default 49.6% 
k0ab4f42158d52c06c94ad60d27a1c5d3 length 1660 packed 552 zstd-fastest 66.7% 
k10ec9fc9f43275ea976da2b7bf36a658 length 1347 packed 734 s2-default 45.5% 
k125353fe0f110002f4bbac86a379dd9c length 483 packed 407 s2-default 15.7% 
k1428dd82727539808515340d95c52013 length 501 packed 407 s2-default 18.8% 
k1622ad89b4b5ecc16cbabdd0d5e11ed1 length 320 packed 330 s2-default 0% 
k16857bc153b551faa4f80992b6565a41 length 826 packed 399 zstd-fastest 51.7% 
k18c0fae9ac6dd39bcdfac44c1f75a728 length 658 packed 352 zstd-fastest 46.5% 
.
.
.
kf9f4b766e4cc4f4bf0d9e4ef6c0a15ab length 5962 packed 2457 s2-default 58.8% 
kfc5d0dab5c4cd7a1c7e01076ff0b94ef length 487 packed 401 s2-default 17.7% 
kfffe9471fbbf3fd85f2d6ef0d745b03d length 496 packed 310 zstd-fastest 37.5% 
xaba18c8daba06062782d5ad11c734c6e length 5905 packed 4696 s2-default 20.5% 
xbf563e1b71bd6786c61b42fd4c781eba length 6106 packed 2539 zstd-fastest 58.4% 
  1. Disable metadata compression for kopia/tests dir
$ kopia policy set ./tests --metadata-compression=none

$ kopia policy show test
.
.
Compression disabled.

Metadata compression disabled.
.
.
  1. Create a file with 500M random data to ./tests dir and snapshot the dir and inspect content. New metadata stats should be seen as uncompressed
$ kopia content stats                                 
Count: 1050
Total Bytes: 1.5 GB
Total Packed: 1.5 GB (compression 0.0%)
By Method:
  (uncompressed)         count: 945 size: 1.5 GB
  zstd-fastest           count: 33 size: 58 KB packed: 20.8 KB compression: 64.1%
  s2-default             count: 72 size: 81.1 KB packed: 47 KB compression: 42.0%
Average: 1.4 MB
Histogram:

        0 between 0 B and 10 B (total 0 B)
        2 between 10 B and 100 B (total 180 B)
      250 between 100 B and 1 KB (total 131.6 KB)
      461 between 1 KB and 10 KB (total 1.6 MB)
       62 between 10 KB and 100 KB (total 1.2 MB)
        1 between 100 KB and 1 MB (total 102.2 KB)
      274 between 1 MB and 10 MB (total 1.5 GB)
        0 between 10 MB and 100 MB (total 0 B)

$ kopia content list --compression  | grep -E '^(x|k)'
k007863a722dae6e1f74bfa28621e9db7 length 497 packed 311 zstd-fastest 37.4%  
k038aab6d0bdd41c7a6c97747cb603dc8 length 648 packed 346 zstd-fastest 46.6%  
k045f25399f14e293ae7077cff1c3b7e7 length 4525 packed 4553 -                  
k05047e1dc21a4d4d16c07fb53c18a021 length 317 packed 327 s2-default 0%      
k051da15776d02f3dcb315e5a273ab657 length 1431 packed 1459 -                
k052746f226d06896b9df7632c504067e length 489 packed 400 s2-default 18.2%     
k0a3b9dd26b35baa0565067d2d5eb81fd length 474 packed 399 s2-default 15.8%    
k0a5adac9c5bfa7cb3b26b36a4e0305b8 length 1686 packed 850 s2-default 49.6% 
k0ab4f42158d52c06c94ad60d27a1c5d3 length 1660 packed 552 zstd-fastest 66.7% 
.
.
.
kfc3cc5f8dc7441f9aebadde89786a894 length 991 packed 1019 - 
kfc5d0dab5c4cd7a1c7e01076ff0b94ef length 487 packed 401 s2-default 17.7% 
kfffe9471fbbf3fd85f2d6ef0d745b03d length 496 packed 310 zstd-fastest 37.5% 
x42109551bb0d7bd2860a46618d5009ab length 6369 packed 6397 - 
xaba18c8daba06062782d5ad11c734c6e length 5905 packed 4696 s2-default 20.5% 
xbf563e1b71bd6786c61b42fd4c781eba length 6106 packed 2539 zstd-fastest 58.4% 

@PrasadG193 PrasadG193 force-pushed the configure-metadata-compression branch from dff0752 to ba2ae1d Compare August 28, 2024 07:11
Adds metadata compression setting to policy
Add support to configure compressor for k and x prefixed content
Set zstd-fastest as the default compressor for metadata in the policy
Adds support to set and show metadata compression to kopia policy commands
Adds metadata compression config to dir writer

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
@PrasadG193 PrasadG193 force-pushed the configure-metadata-compression branch from ba2ae1d to d11e963 Compare August 28, 2024 07:22
@Shrekster
Copy link
Collaborator

We reviewed this change internally and also here: kastenhq#560

@codecov
Copy link

codecov bot commented Aug 28, 2024

Codecov Report

Attention: Patch coverage is 82.07547% with 19 lines in your changes missing coverage. Please review.

Project coverage is 77.23%. Comparing base (cb455c6) to head (2fc49a6).
Report is 311 commits behind head on master.

Files with missing lines Patch % Lines
cli/command_policy_set_compression.go 33.33% 7 Missing and 1 partial ⚠️
cli/command_policy_show.go 66.66% 2 Missing and 1 partial ⚠️
cli/command_policy_set.go 33.33% 1 Missing and 1 partial ⚠️
cli/command_snapshot_fix.go 60.00% 1 Missing and 1 partial ⚠️
repo/grpc_repository_client.go 0.00% 2 Missing ⚠️
snapshot/snapshotfs/dir_rewriter.go 87.50% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4080      +/-   ##
==========================================
+ Coverage   75.86%   77.23%   +1.37%     
==========================================
  Files         470      500      +30     
  Lines       37301    29323    -7978     
==========================================
- Hits        28299    22649    -5650     
+ Misses       7071     4725    -2346     
- Partials     1931     1949      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
@PrasadG193 PrasadG193 changed the title feat(repository): Metadata compression config support for indirect content feat(repository): Metadata compression config support for directory and indirect content Sep 16, 2024
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
@PrasadG193
Copy link
Contributor Author

Hey @jkowalski,
could you please have another look?

@jkowalski
Copy link
Contributor

Approved, although there are some test failures, I'm ok with the PR as long as those are fixed and the content manager layer remains as-is.

@PrasadG193
Copy link
Contributor Author

@jkowalski @Shrekster I've fixed the test. Now that we are always passing the compressor for metadata content, we have to add explicit check for < V2 format not to apply that. Please review latest commit if it makes sense.

@PrasadG193 PrasadG193 requested a review from jkowalski October 15, 2024 15:50
@PrasadG193
Copy link
Contributor Author

@jkowalski could you please have a look if the CI fixes look okay?

Copy link
Contributor

@jkowalski jkowalski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jkowalski jkowalski merged commit 3bf947d into kopia:master Oct 24, 2024
alvistar pushed a commit to alvistar/kopia that referenced this pull request Oct 29, 2024
…nd indirect content (kopia#4080)

* Configure compressor for k and x prefixed content

Adds metadata compression setting to policy
Add support to configure compressor for k and x prefixed content
Set zstd-fastest as the default compressor for metadata in the policy
Adds support to set and show metadata compression to kopia policy commands
Adds metadata compression config to dir writer

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Pass concatenate options with ConcatenateOptions struct

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Move content compression handling to caller

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Move handling manifests to manifest pkg

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Correct const in server_test

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Remove unnecessary whitespace

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Disable metadata compression for < V2 format

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

---------

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
mcamou pushed a commit to mcamou/kopia that referenced this pull request Oct 30, 2024
…nd indirect content (kopia#4080)

* Configure compressor for k and x prefixed content

Adds metadata compression setting to policy
Add support to configure compressor for k and x prefixed content
Set zstd-fastest as the default compressor for metadata in the policy
Adds support to set and show metadata compression to kopia policy commands
Adds metadata compression config to dir writer

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Pass concatenate options with ConcatenateOptions struct

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Move content compression handling to caller

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Move handling manifests to manifest pkg

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Correct const in server_test

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Remove unnecessary whitespace

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

* Disable metadata compression for < V2 format

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>

---------

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants