-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
This issue was first reported internally as a bug in .NET Framework, but it is also reproducible in .NET Core.
- .NET Framework code: https://referencesource.microsoft.com/#mscorlib/system/io/binaryreader.cs,344
- .NET Core 2.2 code: dotnet/coreclr/blob/v2.2.6/src/mscorlib/src/System/IO/BinaryReader.cs#L358
- .NET Core 3.0 code: dotnet/coreclr/blob/v3.0.0-preview8.19379.2/src/System.Private.CoreLib/shared/System/IO/BinaryReader.cs#L388
- .NET Core 5.0 code: dotnet/corefx/blob/2fac89f63c644ab81d936d62c91e6003302e718d/src/Common/src/CoreLib/System/IO/BinaryReader.cs#L362
UTF-16
When using a 2-byte encoding (i.e. UTF-16), and _stream.Read() returns an uneven number of bytes (e.g. a network stream), then in the next iteration of the loop, numBytes can be off by one, and the next read can go past the end of the encoded data in the stream. This can either cause an unexpected end of stream, and/or corruption of any subsequent data read from it.
The repro below is supposed to print all "True"s, as it does when a MemoryStream is used. However, when using a mock NetStream that simulates short reads, data corruption occurs.
Update: It seems that this issue has always been there, so it's not a new regression.
class Program
{
class NetStream : MemoryStream
{
public override int Read(byte[] buffer, int offset, int count)
{
return base.Read(buffer, offset, count > 10 ? count - 3 : count);
}
public override int Read(Span<byte> destination)
{
return base.Read(destination.Length > 10 ? destination.Slice(0, destination.Length - 3) : destination);
}
}
static void Main(string[] args)
{
char[] data1 = "hello world".ToCharArray();
uint data2 = 0xABCDEF01;
uint data3 = 0;
using (Stream stream = new NetStream())
{
using (BinaryWriter writer = new BinaryWriter(stream, Encoding.Unicode, leaveOpen: true))
{
writer.Write(data1);
writer.Write(data2);
writer.Write(data3);
}
Console.WriteLine(stream.Length == data1.Length * sizeof(char) + 2 * sizeof(uint));
stream.Seek(0, SeekOrigin.Begin);
using (BinaryReader reader = new BinaryReader(stream, encoding: Encoding.Unicode, leaveOpen: true))
{
char[] data1b = reader.ReadChars(data1.Length);
Console.WriteLine(data1.AsSpan().SequenceEqual(data1b));
Console.WriteLine(stream.Position == data1.Length * sizeof(char));
uint data2b = reader.ReadUInt32();
Console.WriteLine(data2 == data2b);
Console.WriteLine(stream.Position == data1.Length * sizeof(char) + sizeof(uint));
}
}
}
}UTF-8
The code below passes on .NET Framework and .NET Core 2.2, but fails on 3.0, so it's a new regression:
class Program
{
// version for UTF-8: over-reads if 2 bytes, 3 chars left
// Fails on .NET Core 3.0, passes on 2.2.
static void Main(string[] args)
{
char[] data1 = "hello world 😃x".ToCharArray(); // 14 code points, 15 chars in UTF-16, 17 bytes in UTF-8
uint data2 = 0xABCDEF01;
uint data3 = 0;
using (Stream stream = new MemoryStream())
{
using (BinaryWriter writer = new BinaryWriter(stream, Encoding.UTF8, leaveOpen: true))
{
writer.Write(data1);
writer.Write(data2);
writer.Write(data3);
}
Console.WriteLine(stream.Length == (data1.Length - 1) + 3 + 2 * sizeof(uint));
stream.Seek(0, SeekOrigin.Begin);
using (BinaryReader reader = new BinaryReader(stream, encoding: Encoding.UTF8, leaveOpen: true))
{
char[] data1b = reader.ReadChars(data1.Length);
Console.WriteLine(data1.AsSpan().SequenceEqual(data1b));
Console.WriteLine(stream.Position == (data1.Length - 1) + 3);
uint data2b = reader.ReadUInt32();
Console.WriteLine(data2 == data2b);
Console.WriteLine(stream.Position == (data1.Length - 1) + 3 + sizeof(uint));
}
}
}
}Remarks
- Both versions of
BinaryReaderfail the UTF-16 repro. - The UTF-8 case does not need a short read (i.e. network stream), it fails deterministically on any kind of underlying stream.
- It seems that the reason why the UTF-8 code fails in 3.0 but passes in 2.2, is because the 2.2 runtime uses a version of
BinaryReaderthat has a special case for handling t`his edge case for single-byte encodings:
The special case in 2.2: dotnet/coreclr/blob/ce1d090d33b400a25620c0145046471495067cc7/src/mscorlib/src/System/IO/BinaryReader.cs#L377-L386
...
numBytes = charsRemaining;
// special case for DecoderNLS subclasses when there is a hanging byte from the previous loop
DecoderNLS decoder = _decoder as DecoderNLS;
if (decoder != null && decoder.HasState && numBytes > 1)
{
numBytes -= 1;
}
if (_2BytesPerChar)
...No special case in 3.0 and 5.0: dotnet/coreclr/blob/9642de76d4f5e563150a90f5923b304d87d7a8b1/src/System.Private.CoreLib/shared/System/IO/BinaryReader.cs#L387-L389
...
numBytes = charsRemaining;
if (_2BytesPerChar)
...