-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
I opened an identical issue (erroneously, it turns out) on dotnet/corefx, I'm posting it here and closing it over there to fix my error.
Another bit of odd behavior with Encoder, this time with the Latin1 encoding.
For the "naughty" string @"0️⃣ 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣ 7️⃣ 8️⃣ 9️⃣ 🔟" calling Encoding.GetBytes(...) produces a different result than iteratively calling Encoder.Convert(...). I have a repository with a reproduction (also shows off issue #42750 , but I suspect these are unrelated issues - I just happened to find them at the same time).
For that string, the issue seems to be that Encoder doesn't write the final byte during flushing.
Latin1 is sufficiently weird of an encoding that this may be expected?
A smaller reproduction:
var encoding = Encoding.GetEncoding("iso-8859-1");
var text = @"0️⃣ 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣ 7️⃣ 8️⃣ 9️⃣ 🔟";
var destBufferSize = 2;
var encodingBytes = encoding.GetBytes(text);
var encoder = encoding.GetEncoder();
var chars = text.ToCharArray();
var sourceSpan = chars.AsSpan();
var destSpan = new byte[destBufferSize].AsSpan();
var encoderBytes = new List<byte>();
var completed = false;
// write everything in sourceSpan
while (!completed)
{
var flush = sourceSpan.Length == 0;
encoder.Convert(sourceSpan, destSpan, flush, out var charsConsumed, out var bytesProduced, out completed);
encoderBytes.AddRange(destSpan.Slice(0, bytesProduced).ToArray());
sourceSpan = sourceSpan.Slice(charsConsumed);
}
var eq = encodingBytes.SequenceEqual(encoderBytes);
if (eq)
{
return;
}
var encodingAsStr = encoding.GetString(encodingBytes);
var encoderAsStr = encoding.GetString(encoderBytes.ToArray());
Console.WriteLine($@"Encoding Convert failure for destBufferSize={destBufferSize} - {encodingBytes.Length}:""{encodingAsStr}"" vs {encoderBytes.Count}:""{encoderAsStr}""");This prints when it (probably) shouldn't.
This behavior is quite sensitive to the value of destBufferSize, happening for this particular string at values 2, 4, 5, 8, 10, 20, 40 ,& 41 (note that 42 is the size the destination buffer needs to be for a single Convert call to be sufficient).