[DRAFT][WIP]Add Zip Archives password support#122093
[DRAFT][WIP]Add Zip Archives password support#122093alinpahontu2912 wants to merge 60 commits intodotnet:mainfrom
Conversation
…assword and unprotected
… with cryptography lib
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipCryptoStream.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/WinZipAesStreamWrappedConformanceTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipCryptoStream.cs
Show resolved
Hide resolved
src/libraries/Common/src/Microsoft/Win32/SafeHandles/SafeX509Handles.Unix.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/WinZipAesStream.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/ZipCryptoStreamWrappedConformanceTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/WinZipAesStream.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/WinZipAesStream.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.Async.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Outdated
Show resolved
Hide resolved
...stem.IO.Compression.ZipFile/src/System/IO/Compression/ZipFileExtensions.ZipArchive.Create.cs
Show resolved
Hide resolved
src/libraries/Common/src/Microsoft/Win32/SafeHandles/SafeX509Handles.Unix.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.Async.cs
Outdated
Show resolved
Hide resolved
...O.Compression.ZipFile/src/System/IO/Compression/ZipFileExtensions.ZipArchiveEntry.Extract.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/ZipCryptoStreamWrappedConformanceTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.Async.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/WinZipAesStream.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/WinZipAesStream.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/WinZipAesStream.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/WinZipAesStream.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/WinZipAesStream.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
General feedback:
- Secrets (including an input password) should never be copied to the heap, unless it's utterly impractical.
- Secrets (including an input password) should also only minimally be copied around on the stack.
- Copied secrets should be cleared.
- Avoid having cryptographic keys (HMAC, AES, etc) stored in arrays.
GitHub's current UI is making this change hard to holistically review, and it's marked as draft, so I've only sprinkled in the most important comments.
| uint key2 = 878082192; | ||
|
|
||
| // ZipCrypto uses raw bytes; ASCII is the most interoperable | ||
| var bytes = Encoding.ASCII.GetBytes(password.ToArray()); |
There was a problem hiding this comment.
password.ToArray() is bad, you're making an uncleared copy of the secret.
Since you have a streaming protocol, you can just use a stackalloc buffer and walk it pieces at a time. (Especially because I think you're only on .NET "Core")
const int StackAllocSize = 64;
const int SegmentSize = StackAllocSize / 2;
Debug.Assert(Encoding.ASCII.GetMaxByteCount(SegmentSize) == StackAllocSize);
Span<byte> buf = stackalloc byte[64];
ReadOnlySpan<byte> pwSpan = password.Span;
while (!pwSpan.IsEmpty)
{
ReadOnlySpan<byte> segment = pwSpan;
if (segment.Length > SegmentSize)
{
segment = segment.Slice(0, SegmentSize);
}
int byteCount = Encoding.ASCII.GetBytes(segment, buf);
foreach (byte b in buf.Slice(0, byteCount))
{
UpdateKeys(ref key0, ref key1, ref key2, b);
}
pwSpan = pwSpan.Slice(segment.Length);
}
...| // Creates the persisted key bytes from a password. | ||
| // The returned byte array contains the 3 ZipCrypto keys (key0, key1, key2) | ||
| // serialized as 12 bytes in little-endian format. | ||
| public static byte[] CreateKey(ReadOnlyMemory<char> password) |
There was a problem hiding this comment.
I believe it's always a good idea to avoid public in non-public classes, since code reviewers then have to stop and think "is this externally callable?"
| public static byte[] CreateKey(ReadOnlyMemory<char> password) | |
| private static byte[] CreateKey(ReadOnlyMemory<char> password) |
| } | ||
|
|
||
| // Serialize the 3 keys to bytes in little-endian format | ||
| byte[] keyBytes = new byte[KeySize]; |
There was a problem hiding this comment.
Can we avoid making an array for this?
There was a problem hiding this comment.
AFAICT, it just gets packed into this array then unpacked into 3 integers later. Return a struct of 3 integers... try to keep the key off the heap.
| // Creates the persisted key bytes from a password. | ||
| // The returned byte array contains the 3 ZipCrypto keys (key0, key1, key2) | ||
| // serialized as 12 bytes in little-endian format. | ||
| public static byte[] CreateKey(ReadOnlyMemory<char> password) |
There was a problem hiding this comment.
Why is this taking ReadOnlyMemory when it could take ReadOnlySpan?
| _key2 = BinaryPrimitives.ReadUInt32LittleEndian(keyBytes.AsSpan(8, 4)); | ||
| } | ||
|
|
||
| private byte[] CalculateHeader() |
There was a problem hiding this comment.
Instead of returning an array, can it be given the destination span to write to?
| for (int i = 0; i < KeystreamBufferSize; i += BlockSize) | ||
| { | ||
| BinaryPrimitives.WriteUInt128LittleEndian(counterBlock, _counter); | ||
| _aesEncryptor.TransformBlock(counterBlock, 0, BlockSize, _keystreamBuffer, i); |
There was a problem hiding this comment.
I recommend using _aes.EncryptEcb over an ICryptoTransform, it'll allow you to work with spans instead of arrays.
| byte[] counterBlock = new byte[BlockSize]; | ||
|
|
||
| // Generate KeystreamBufferSize bytes of keystream (256 blocks of 16 bytes each) | ||
| for (int i = 0; i < KeystreamBufferSize; i += BlockSize) |
There was a problem hiding this comment.
Instead of 256 calls to encrypt, you could
- Fill the whole of _keyStreamBuffer with the counter values
- Call Encrypt/Transform just once, writing in-place.
| private static void XorBytes(Span<byte> dest, ReadOnlySpan<byte> src) | ||
| { | ||
| Debug.Assert(dest.Length <= src.Length); | ||
|
|
||
| for (int i = 0; i < dest.Length; i++) | ||
| { | ||
| dest[i] ^= src[i]; | ||
| } | ||
| } |
There was a problem hiding this comment.
This undoubtedly could be vectorized to go faster, if you expect the operation to generally be bigger than a few bytes. Even just doing something like
Span<ulong> destUL = Marshal.As<byte, ulong>(dest);
ReadOnlySpan<ulong> srcUL = Marshal.As<byte, ulong>(src);
XorUL(destUL, srcUL);
for (int i = destUL.Length * sizeof(ulong); i < dest.Length; i++)
{
...
}(Though I think if src/dest isn't QWORD aligned that might do something less happy on ARM?)
There was a problem hiding this comment.
Might be premature, but if we want to vectorize this, then approach with something like Vector<byte> might be better.
| // Validate auth code immediately when we've read all encrypted data | ||
| if (_encryptedDataRemaining <= 0) | ||
| { | ||
| ValidateAuthCode(); |
There was a problem hiding this comment.
In general, this isn't the right approach... you've already written all of the data into the caller buffer before telling them it's bad. But the only way you can avoid that is to give up on streaming, and I assume you don't want to do that.
(With streaming enabled, even if you did buffer.Clear() on failure, someone could already have read all but the last byte on previous calls)
So, no action here, other than "be educated that this is a bad practice, and is why .NET's AES-GCM doesn't allow streaming decryption"
| public static void ExtractToDirectory(this System.IO.Compression.ZipArchive source, string destinationDirectoryName) { } | ||
| public static void ExtractToDirectory(this System.IO.Compression.ZipArchive source, string destinationDirectoryName, bool overwriteFiles) { } | ||
| public static void ExtractToDirectory(this System.IO.Compression.ZipArchive source, string destinationDirectoryName, bool overwriteFiles, string password) { } | ||
| public static void ExtractToDirectory(this System.IO.Compression.ZipArchive source, string destinationDirectoryName, string password) { } |
There was a problem hiding this comment.
How many of these operations could take the password as a ReadOnlySpan (so that it could have been securely read into the stack, or native memory), or (at worst) a ReadOnlyMemory?
String is definitely friendlier, but being able to avoid a heap copy of a secret is a more secure design.
| if (string.IsNullOrEmpty(password)) | ||
| { | ||
| throw new ArgumentNullException(nameof(password), SR.EmptyPassword); | ||
| } |
There was a problem hiding this comment.
These overloads treat an empty password the same as null and throw ArgumentNullException. Since the parameter is non-nullable and empty is a different invalid value, this should be an ArgumentException (or ArgumentException.ThrowIfNullOrEmpty) rather than ArgumentNullException.
| if (string.IsNullOrEmpty(password)) | ||
| { | ||
| throw new ArgumentNullException(nameof(password), SR.EmptyPassword); | ||
| } | ||
|
|
There was a problem hiding this comment.
These overloads treat an empty password the same as null and throw ArgumentNullException. Since the parameter is non-nullable and empty is a different invalid value, this should be an ArgumentException (or ArgumentException.ThrowIfNullOrEmpty) rather than ArgumentNullException.
| if (string.IsNullOrEmpty(password)) | |
| { | |
| throw new ArgumentNullException(nameof(password), SR.EmptyPassword); | |
| } | |
| if (password is null) | |
| { | |
| throw new ArgumentNullException(nameof(password)); | |
| } | |
| if (password.Length == 0) | |
| { | |
| throw new ArgumentException(SR.EmptyPassword, nameof(password)); | |
| } |
| if (!IsEncrypted) | ||
| { | ||
| throw new InvalidDataException(SR.EntryNotEncrypted); | ||
| } | ||
| return await OpenInReadModeAsync(checkOpenable: true, cancellationToken, password.AsMemory()).ConfigureAwait(false); |
There was a problem hiding this comment.
OpenAsync(string password, EncryptionMethod encryptionMethod) in Read mode currently ignores the fact that an encryption method was specified and proceeds to open for reading. This contradicts the sync Open(password, encryptionMethod) behavior (and the SR.EncryptionReadMode message), and makes the API inconsistent. In Read mode this overload should throw InvalidOperationException (SR.EncryptionReadMode), matching the other overloads.
| if (!IsEncrypted) | |
| { | |
| throw new InvalidDataException(SR.EntryNotEncrypted); | |
| } | |
| return await OpenInReadModeAsync(checkOpenable: true, cancellationToken, password.AsMemory()).ConfigureAwait(false); | |
| throw new InvalidOperationException(SR.EncryptionReadMode); |
| using (FileStream fs = new FileStream(extractPath, fileStreamOptions)) | ||
| { | ||
| using (Stream es = !string.IsNullOrEmpty(password) ? source.Open(password) : source.Open()) | ||
| es.CopyTo(fs); |
There was a problem hiding this comment.
The password overload falls back to source.Open()/OpenAsync() when the provided password is empty. Since the API contract is “extract decrypted using the specified password”, accepting an empty string and silently treating it as “no password” is inconsistent (and will also fail for encrypted entries). Prefer validating the password (throw for null/empty) and always calling the password-based Open/OpenAsync overload.
| <data name="InvalidWinZipSize" xml:space="preserve"> | ||
| <value>Stream size is too small for WinZip standard</value> | ||
| </data> | ||
| <data name="InvalidPassword" xml:space="preserve"> | ||
| <value>Invalid password</value> | ||
| </data> | ||
| <data name="WinZipAuthCodeMismatch" xml:space="preserve"> | ||
| <value>Authentication code mismatch for WinZip encrypted entry.</value> | ||
| </data> | ||
| <data name="EntryNotEncrypted" xml:space="preserve"> | ||
| <value>Entry is not encrypted.</value> | ||
| </data> | ||
| <data name="PasswordRequired" xml:space="preserve"> | ||
| <value>A password is required for encrypted entries.</value> | ||
| </data> | ||
| <data name="EmptyPassword" xml:space="preserve"> | ||
| <value>Empty password was provided for encrypting this entry</value> | ||
| </data> | ||
| <data name="InvalidAesStrength" xml:space="preserve"> | ||
| <value>Invalid AES strength value.</value> | ||
| </data> | ||
| <data name="TruncatedZipCryptoHeader" xml:space="preserve"> | ||
| <value>Truncated ZipCrypto header.</value> | ||
| </data> | ||
| <data name="CrcMismatch" xml:space="preserve"> | ||
| <value>CRC mismatch.</value> | ||
| </data> | ||
| <data name="EncryptionNotSpecified" xml:space="preserve"> | ||
| <value>Encryption type is not specified</value> | ||
| </data> | ||
| <data name="EncryptionReadMode" xml:space="preserve"> | ||
| <value>Encryption Method should not be specified in Read Mode</value> | ||
| </data> | ||
| <data name="EncryptionUpdateMode" xml:space="preserve"> | ||
| <value>Encryption Method should not be specified in Update Mode</value> | ||
| </data> | ||
| <data name="WinZipEncryptionNotSupportedOnBrowser" xml:space="preserve"> | ||
| <value>WinZip encryption is not supported on browser</value> | ||
| </data> |
There was a problem hiding this comment.
Several newly added resource strings are inconsistent with existing message style: missing trailing periods, inconsistent capitalization (“Encryption Method”), and slightly ungrammatical phrasing (e.g., “on browser”). It would be better to align with the surrounding resources (sentence case + period) to keep exception messages consistent.
|
|
||
| private void EnsureHeader() | ||
| { | ||
| WriteHeaderCore(isAsync: false).AsTask().GetAwaiter().GetResult(); |
There was a problem hiding this comment.
ZipCryptoStream.EnsureHeader synchronously blocks on a ValueTask by calling AsTask().GetAwaiter().GetResult(), which allocates a Task unnecessarily. ValueTask supports GetAwaiter() directly; using that avoids the allocation on every write path.
| WriteHeaderCore(isAsync: false).AsTask().GetAwaiter().GetResult(); | |
| WriteHeaderCore(isAsync: false).GetAwaiter().GetResult(); |
| EnsureHeader(); | ||
|
|
||
| byte[] tmp = new byte[buffer.Length]; | ||
| for (int i = 0; i < buffer.Length; i++) | ||
| { | ||
| byte ks = DecryptByte(_key2); | ||
| byte p = buffer[i]; | ||
| tmp[i] = (byte)(p ^ ks); | ||
| UpdateKeys(ref _key0, ref _key1, ref _key2, p); | ||
| } | ||
| _base.Write(tmp, 0, tmp.Length); | ||
| } |
There was a problem hiding this comment.
ZipCryptoStream.Write allocates a new byte[] the size of the caller’s buffer on every call, which can be very expensive for typical stream copy buffer sizes (e.g., 80KB). Consider renting from ArrayPool (and returning in a finally) or chunking to a reusable buffer to avoid per-call large allocations.
| public override void Write(ReadOnlySpan<byte> buffer) | ||
| { | ||
| ThrowIfNotWritable(); | ||
| if (!_headerWritten) | ||
| { | ||
| WriteHeader(); | ||
| } | ||
|
|
||
| byte[] workBuffer = new byte[KeystreamBufferSize]; | ||
| WriteCore(buffer, workBuffer); | ||
| } | ||
|
|
||
| public override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) | ||
| { | ||
| ValidateBufferArguments(buffer, offset, count); | ||
| return WriteAsyncCore(buffer.AsMemory(offset, count), cancellationToken).AsTask(); | ||
| } | ||
|
|
||
| private async ValueTask WriteAsyncCore(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken) | ||
| { | ||
| cancellationToken.ThrowIfCancellationRequested(); | ||
| ThrowIfNotWritable(); | ||
| if (!_headerWritten) | ||
| { | ||
| await WriteHeaderAsync(cancellationToken).ConfigureAwait(false); | ||
| } | ||
|
|
||
| int inputOffset = 0; | ||
| int inputCount = buffer.Length; | ||
| byte[] workBuffer = new byte[KeystreamBufferSize]; | ||
|
|
There was a problem hiding this comment.
WinZipAesStream allocates a new 4KB work buffer on every Write/WriteAsync call. For typical usage (CopyTo with many writes), this becomes a steady allocation stream. Consider using ArrayPool or a reusable instance buffer instead.
| public void WriteBlock(Stream stream) | ||
| { | ||
| Span<byte> buffer = new byte[TotalSize]; | ||
| WriteBlockCore(buffer); | ||
| stream.Write(buffer); | ||
| } | ||
|
|
||
| public async Task WriteBlockAsync(Stream stream, CancellationToken cancellationToken = default) | ||
| { | ||
| byte[] buffer = new byte[TotalSize]; | ||
| WriteBlockCore(buffer); | ||
| await stream.WriteAsync(buffer, cancellationToken).ConfigureAwait(false); | ||
| } | ||
|
|
||
| private void WriteBlockCore(Span<byte> buffer) | ||
| { | ||
| BinaryPrimitives.WriteUInt16LittleEndian(buffer.Slice(0), HeaderId); | ||
| BinaryPrimitives.WriteUInt16LittleEndian(buffer.Slice(2), 7); // DataSize | ||
| BinaryPrimitives.WriteUInt16LittleEndian(buffer.Slice(4), VendorVersion); |
There was a problem hiding this comment.
WinZipAesExtraField.WriteBlock allocates on the heap (new byte[TotalSize]) and WriteBlockCore uses a hard-coded size value (7) even though DataSize exists. Since TotalSize/DataSize are constants and the block is tiny, prefer stackalloc (or reuse DataSize) to avoid allocations and magic numbers.
Fixes #1545
Big Milestones and status: