-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Background and motivation
By specification, Zip archives support "weak" and "strong" entry-level encryption. Weak encryption of an entry (via the ZipCrypto algorithm) is indicated in the entry-level General Purpose Bit Flag value by setting bit 0 to 1. Strong encryption (e.g. AES-*) is indicated with both bits 0 and 6 (PKZIP) or by setting bit 0 and using an extra data field (WinZip). With any encryption scheme, General Purpose Bit Flag bit 0 will be set if an entry is encrypted.
Our APIs do not currently expose any indicators for whether an archive contains any weak- or strong-encrypted entries, nor do we expose the General Purpose Bit Flag for advanced scenarios that require checking bits within that value. The end result is that calling code cannot determine if an entry is encrypted. Furthermore, when attempting to extract an encrypted entry from a ZipArchive, two different behaviors are observed:
- For weak (ZipCrypto) encryption, the encrypted data is received as the contents of the entry, and there is no means for identifying that the entry was encrypted. This can lead to invalid or seemingly corrupt data to be passed to the next subsystem.
- For strong encryption, an
InvalidDataExceptionis thrown when attempting to extract the entry. This exception type is also thrown in other circumstances, and there is no discriminating information to indicate the reason for the exception; thus, there is no means for identifying that the entry was encrypted.
Issue #57197 requests a means for detecting if a ZipArchiveEntry is encrypted while iterating over the entries in the archive.
API Proposal
namespace System.IO.Compression
{
public class ZipArchiveEntry
{
public bool IsEncrypted { get; }
}
}API Usage
try
{
if (!entry.IsEncrypted)
{
using (var entryStream = entry.Open())
{
entryStream.CopyTo(documentStream);
}
}
else
{
// App-specific logic for how to handle/log that the zip contained encrypted entries
}
}
catch (InvalidDataException)
{
// This will no longer occur for encrypted entries
}Additional APIs to Consider
Expose an Archive-Level Property
We could additionally include an archive-level property to indicate if any of the entries are encrypted. However, since the encryption information is stored at the entry-level "General Purpose Bit Flag" value, populating this property would require iterating over the entries as is shown in the usage example above. If demand for this archive-level property exists, it could be added to ZipArchive alongside the existing Entries and Mode properties. Because ZipArchive currently contains no other aggregate information about the entries though, I recommend we do not add this property unless sufficient requests are received after adding the entry-level property.
using System.Collections.ObjectModel;
namespace System.IO.Compression
{
public class ZipArchive
{
public ReadOnlyCollection<ZipArchiveEntry> Entries { get; }
public ZipArchiveMode Mode { get; }
public bool HasEncryptedEntries { get; } // Access would ensure the entries have been enumerated internally
}
}Specify Encryption Handling Behavior to ZipFile Accelerators
The ZipFile class provides static accelerator methods, including Open, OpenRead, and ExtractToDirectory. The Open and OpenRead methods simply return a ZipArchive without taking any other action and they succeed with weak- or strong-encrypted entries present. The ExtractToDirectory method exhibits the same behavior as described above, with an InvalidDataException being thrown for strong-encrypted entries and silently extracting encrypted entries for weak encryption.
We could add overloads to ZipFile.ExtractToDirectory to specify how encrypted entries should be handled. I recommend we do not pursue this unless sufficient demand emerges. At that time, we could learn more about the scenarios to determine if we need an enum to define different behaviors or if we merely need to allow a boolean to indicate encrypted entries should be skipped. Without these overloads, a caller can instead use ZipFile.Open[Read] and iterate over the entries.
Alternative Designs
Exposing the Entry Encryption Algorithm
An alternative design would use an enum to indicate the encryption algorithm used for an entry:
namespace System.IO.Compression
{
public class ZipArchiveEntry
{
public ZipEntryEncryption Encryption { get; }
}
public enum ZipEntryEncryption
{
None,
Weak, // ZipCrypto
Aes128,
Aes192,
Aes256,
Unknown = 0xFFFF,
}
}For Add password to ZipArchive · Issue #1545 · dotnet/runtime, an enum along these lines is very likely to be needed. Without fully designing the APIs for those features though, it's possible that the enum illustrated above might not be the right design for the full feature set. Since the enum is not strictly necessary for detecting if there are password-protected/encrypted entries, I recommend we stick with the simple boolean for now. If #1545 is implemented and we end up adding an Encryption property that uses an enum similar to what's shown here, then the IsEncrypted property would become a convenience method over checking if Encryption is ZipEntryEncryption.None or another value. For the initial implementation though, IsEncrypted would only need to be based on General Purpose Bit 0 being set.
Exposing the General Purpose Bit Flag Value
We could optionally expose the General Purpose Bit Flag value on ZipArchiveEntry, either in addition to or instead of IsEncrypted. Exposing this value would allow callers to check other bits to glean more about the archive entry than our APIs support.
namespace System.IO.Compression
{
public class ZipArchiveEntry
{
public ushort GeneralPurposeBits { get; }
}
}From there, we could also ostensibly introduce an enum that names each of the bits per the specification. There has not been demand for this low-level data to be exposed however, so I recommend we do not pursue this approach unless demand emerges.
netstandard support via Exception.Data
Some of the requests for this functionality came with the desire for netstandard support. Without adding new API surface, a failed extraction could include data on the exception instance that indicates why the InvalidDataException was thrown. For instance, Exception.Data could include a key/value pair of "IsEncrypted" and the boolean indicating if the entry was encrypted. However, this would rely on exception handling to control flow, which is an anti-pattern.
Risks
The Zip file format also includes the ability to encrypt some of the central directory metadata, including file names. This API design does not take that concept into account. Popular tools such as WinZip, 7-Zip, and WinRAR do not support this capability either though. More typically, a password is set for the zip such that it is applied to each of the entries individually, and the central directory metadata remains unencrypted.
Based on the specification for how central directory metadata encryption is implemented, the entry count values are to be obfuscated if the central directory is encrypted (reference). That expectation would likely introduce other issues with the current ZipArchive implementation if we sought to support encryption of the central directory. However, the file entry header General Purpose Bit Flag would remain the same, with bit 0 still indicating that the entry is also encrypted; thus the proposed design would still apply.