-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Description
System.Formats.Tar.TarReader does not handle GNU sparse format 1.0 entries encoded via PAX extended attributes. When reading such entries, TarEntry.Name returns the internal placeholder path (containing GNUSparseFile.0) instead of the real file name, and TarEntry.Length returns the stored (sparse) size rather than the real file size.
GNU sparse format 1.0 stores the real name and size in PAX extended attributes:
GNU.sparse.name— the real file pathGNU.sparse.realsize— the real file size
TarHeader.ReplaceNormalAttributesWithExtended() processes standard PAX attributes like path, size, mtime, etc., but does not process GNU.sparse.name or GNU.sparse.realsize.
How this occurs in practice
macOS ships bsdtar (libarchive), which detects sparse files by default during archive creation. .NET DLLs on APFS have zero-filled PE alignment sections that APFS stores as filesystem holes, causing bsdtar to treat them as sparse and encode them with the GNU sparse PAX format.
The tar command producing the affected archive was:
tar -cf - . | pigz > output.tar.gz
When .NET's TarReader reads these archives, ~46% of entries have incorrect names containing GNUSparseFile.0.
Reproduction Steps
Option 1 — With an affected tar.gz file
Download an affected tarball (a .NET SDK built on macOS):
dotnet-sdk-11.0.100-ci-osx-x64.tar.gz
Then run the repro program (below) against it.
Option 2 — Create a sparse tar.gz on macOS
On a Mac, create a sparse file and archive it:
# Create a file with sparse holes
dd if=/dev/zero of=sparse.bin bs=1 count=0 seek=1048576
echo "hello" >> sparse.bin
# Archive it (bsdtar detects sparse by default)
tar -czf sparse.tar.gz sparse.binThen read it on any platform with the repro program below.
Repro Program
Program.cs:
using System.Formats.Tar;
using System.IO.Compression;
if (args.Length == 0)
{
Console.Error.WriteLine("Usage: dotnet run -- <path-to-tarball.tar.gz>");
return 1;
}
string path = args[0];
if (!File.Exists(path))
{
Console.Error.WriteLine($"File not found: {path}");
return 1;
}
Console.WriteLine($"Reading: {path}");
Console.WriteLine();
int totalEntries = 0;
int sparseEntries = 0;
using FileStream fs = File.OpenRead(path);
using GZipStream gz = new(fs, CompressionMode.Decompress);
using TarReader reader = new(gz);
while (reader.GetNextEntry() is TarEntry entry)
{
totalEntries++;
if (entry is PaxTarEntry pax
&& pax.ExtendedAttributes.TryGetValue("GNU.sparse.name", out string? realName))
{
sparseEntries++;
if (sparseEntries <= 5)
{
Console.WriteLine($"Entry #{totalEntries}:");
Console.WriteLine($" entry.Name (WRONG): {entry.Name}");
Console.WriteLine($" GNU.sparse.name : {realName}");
if (pax.ExtendedAttributes.TryGetValue("GNU.sparse.realsize", out string? realSize))
{
Console.WriteLine($" entry.Length : {entry.Length}");
Console.WriteLine($" GNU.sparse.realsize: {realSize}");
}
Console.WriteLine();
}
}
}
Console.WriteLine($"Total entries : {totalEntries}");
Console.WriteLine($"Sparse entries: {sparseEntries}");
if (sparseEntries > 0)
{
Console.WriteLine();
Console.WriteLine("BUG: TarReader exposes internal 'GNUSparseFile.0' placeholder paths");
Console.WriteLine(" instead of using the real name from GNU.sparse.name.");
}
return sparseEntries > 0 ? 1 : 0;tar-repro.csproj:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net9.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
</Project>Expected behavior
For entries with GNU.sparse.name and GNU.sparse.realsize PAX extended attributes:
entry.Nameshould return the value ofGNU.sparse.name(e.g.,./shared/Microsoft.NETCore.App/11.0.0-ci/Microsoft.CSharp.dll)entry.Lengthshould return the value ofGNU.sparse.realsize(e.g.,1115136)
Actual behavior
entry.Namereturns the internal placeholder path (e.g.,./shared/Microsoft.NETCore.App/11.0.0-ci/GNUSparseFile.0/Microsoft.CSharp.dll)entry.Lengthreturns the stored/sparse size (e.g.,791040)
Example output from the repro against the linked tarball:
Reading: dotnet-sdk-11.0.100-ci-osx-x64.tar.gz
Entry #9:
entry.Name (WRONG): ./shared/Microsoft.NETCore.App/11.0.0-ci/GNUSparseFile.0/Microsoft.CSharp.dll
GNU.sparse.name : ./shared/Microsoft.NETCore.App/11.0.0-ci/Microsoft.CSharp.dll
entry.Length : 791040
GNU.sparse.realsize: 1115136
Total entries : 199
Sparse entries: 91
BUG: TarReader exposes internal 'GNUSparseFile.0' placeholder paths
instead of using the real name from GNU.sparse.name.
Suggested Fix
In TarHeader.ReplaceNormalAttributesWithExtended(), add handling for the GNU sparse PAX attributes after the existing standard attribute processing:
// GNU sparse format 1.0 stores the real name and size in extended attributes.
// The header's name field contains an internal placeholder like "GNUSparseFile.0/...".
if (ExtendedAttributes.TryGetValue("GNU.sparse.name", out string? gnuSparseName))
{
_name = gnuSparseName;
}
if (TarHelpers.TryGetStringAsBaseTenLong(ExtendedAttributes, "GNU.sparse.realsize", out long gnuSparseRealSize))
{
_size = gnuSparseRealSize;
}Configuration
- Affects all .NET versions with
System.Formats.Tar(net7.0+) - All platforms when reading archives created on macOS (or any system using bsdtar/libarchive with sparse detection)
- The archive creation side can work around this with
tar --no-read-sparse, but TarReader should handle this format correctly regardless
Impact
This is a real-world issue affecting .NET CI/CD infrastructure. Archives produced by macOS build agents contain GNU sparse PAX entries for .NET DLLs, and downstream tools using TarReader to process these archives (e.g., for code signing) encounter incorrect paths, leading to build failures.