Skip to content

Conversation

@carlossanlop
Copy link
Contributor

@carlossanlop carlossanlop commented Apr 21, 2025

This PR fixes two bugs in the GNU format:

  1. The atime and ctime fields (unique to this format) need to be written as null characters if their value is the Unix Epoch. The bug was that we were writing them as "0" characters as other GNU metadata fields. This was causing tools like 7-zip to show an extra unexpected folder with a name formed by just "0" characters. Other tools were unable to read these archives as they were considered corrupt.

    Added sync and async tests to validate directly in the memory stream that when atime and ctime represent the Unix Epoch, they are indeed written as null characters.

  2. The unused bytes of the size metadata field should be filled with "0" characters. The bug was that we were filling them with nulls like in the other formats.
    Added sync and async tests to validate directly in the memory stream that the unused bytes in the size field are indeed written as "0" characters.

    Added sync and async tests to validate directly in the memory stream that when atime and ctime represent the Unix Epoch, they are indeed written as null characters.

Manual test: reading entries in a GNU archive generated by another tool, then rewriting each entry into a new archive created with TarWriter. Then opening it with 7-zip.

Before - Note how 7-zip did its best to deduce the minimum required fields, but most was empty:
image

After - Note that 7-zip now shows many other fields that it was unable to read before:
image

…eld should be filled with "0" characters. The bug was that we were filling them with nulls like in the other formats.

Added sync and async tests to validate directly in the memory stream that the unused bytes in the size field are indeed written as "0" characters.
…is format) need to be written as null characters if their value is the Unix Epoch. The bug was that we were writing them as "0" characters as other GNU metadata fields. This was causing tools like 7-zip to show an extra unexpected folder with a name formed by just "0" characters. Other tools were unable to read these archives as they were considered corrupt.

Added sync and async tests to validate directly in the memory stream that when atime and ctime represent the Unix Epoch, they are indeed written as null characters.
@carlossanlop carlossanlop self-assigned this Apr 21, 2025
Copilot AI review requested due to automatic review settings April 21, 2025 20:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes issues with GNU tar entry metadata where atime and ctime fields are incorrectly written as "0" characters instead of null characters, and ensures that unused bytes in the size field are correctly set to "0" characters. Key changes include adding sync and async tests for these behaviors and adjusting header writing logic to correctly format these fields.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/libraries/System.Formats.Tar/tests/TarEntry/GnuTarEntry.Tests.cs Added tests to verify unused size field bytes and correct handling of Unix Epoch timestamps
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs Modified header writing logic to remove unnecessary _dataStream check and return 0 for UnixEpoch timestamps

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-formats-tar
See info in area-owners.md if you want to be subscribed.

@mmitche
Copy link
Member

mmitche commented Apr 21, 2025

#1 and #2 fix in your PR description appear to be the same. Can you update?

@carlossanlop
Copy link
Contributor Author

carlossanlop commented Apr 21, 2025

#1 and #2 fix in your PR description appear to be the same. Can you update?

Ah sorry. Fixed. The correct descriptions are in the commits themselves, anyway.

@carlossanlop
Copy link
Contributor Author

Need to adjust the checksum unit tests, as the expected values are wrong now.

…egular timestamps (mtime) should not be affected by this.

Modify all the sync and async checksum shared code so that they reflect the changes in the expected values of size, atime and ctime.
@carlossanlop
Copy link
Contributor Author

carlossanlop commented Apr 22, 2025

I pushed an extra commit to separate the new timestamp checking logic from the old method and created another one just for the gnu timestamps. I also fixed all the checksum unit tests, sync and async, since the expected values for gnu were now different. I also separated the gnu checksum verification tests into their own separate ones since they need to additionaly verify the epoch values (no other formats will have this issue).

@ViktorHofer ViktorHofer merged commit 8c3a2e1 into dotnet:main Apr 22, 2025
81 of 85 checks passed
@carlossanlop carlossanlop deleted the FixGnu branch April 22, 2025 16:30
@github-actions github-actions bot locked and limited conversation to collaborators May 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants