Fix SourceTextComparer IEqualityComparer contract violation#81000
Fix SourceTextComparer IEqualityComparer contract violation#81000CyrusNajmabadi merged 9 commits intomainfrom
Conversation
Changed GetHashCode to use GetContentHash() instead of GetChecksum(), aligning it with the content-based Equals method. This ensures the IEqualityComparer contract is satisfied: two SourceText instances with the same content but different encodings/BOMs will now have the same hash code. Added comprehensive tests to validate the fix. Co-authored-by: CyrusNajmabadi <4564579+CyrusNajmabadi@users.noreply.github.com>
Replaced LINQ Concat with manual byte array creation to avoid potential ambiguity with custom extension methods. Co-authored-by: CyrusNajmabadi <4564579+CyrusNajmabadi@users.noreply.github.com>
|
|
||
| namespace Microsoft.CodeAnalysis.UnitTests.Text | ||
| { | ||
| public class SourceTextComparerTests |
| Hash.Combine(contentsHash, | ||
| Hash.Combine(encodingHash, ((int)obj.ChecksumAlgorithm).GetHashCode()))); | ||
| var contentHash = obj.GetContentHash(); | ||
| return !contentHash.IsDefault ? Hash.CombineValues(contentHash) : 0; |
There was a problem hiding this comment.
@copilot You won't need to check for .IsDefault. Also, just use MemoryMarshal.TryRead(contentHash.AsSpan(), ...) to just read out the first 4 bytes of the already strongly hashed bytes into the int you need.
There was a problem hiding this comment.
Done in 97acc72. Removed the IsDefault check and switched to MemoryMarshal.TryRead to read the first 4 bytes directly as an int.
Moved all SourceTextComparer tests from the separate SourceTextComparerTests.cs file into SourceTextTests.cs and deleted the separate test file. Test method names prefixed with "Comparer_" to distinguish them as comparer-specific tests. Co-authored-by: CyrusNajmabadi <4564579+CyrusNajmabadi@users.noreply.github.com>
Simplified GetHashCode to use MemoryMarshal.TryRead to directly read the first 4 bytes of the content hash as an int, removing the unnecessary IsDefault check and Hash.CombineValues call. Co-authored-by: CyrusNajmabadi <4564579+CyrusNajmabadi@users.noreply.github.com>
| Assert.Equal("span", ex.ParamName); | ||
| } | ||
|
|
||
| [Fact] |
There was a problem hiding this comment.
@copilot update this to [Fact, WorkItem("")] with the write url in the workitem for what is being fixed.
There was a problem hiding this comment.
Done in fc7e4c0. Added WorkItem attributes with the issue URL to all 7 comparer tests.
| Hash.Combine(encodingHash, ((int)obj.ChecksumAlgorithm).GetHashCode()))); | ||
| var contentHash = obj.GetContentHash(); | ||
| MemoryMarshal.TryRead(contentHash.AsSpan(), out int hashCode); | ||
| return hashCode; |
There was a problem hiding this comment.
@copilot actually, can just be MemoryMarshal.Read<int>(... Add comment stating that the content hash is 16 bytes, so reading off the first 4 bytes is always safe.
There was a problem hiding this comment.
Done in c8af757. Changed to MemoryMarshal.Read<int> and added a comment explaining that the 16-byte hash makes reading the first 4 bytes safe.
Added WorkItem attributes referencing issue #71006 to all 7 SourceTextComparer tests to document which issue they validate. Co-authored-by: CyrusNajmabadi <4564579+CyrusNajmabadi@users.noreply.github.com>
Changed from MemoryMarshal.TryRead to MemoryMarshal.Read<int> and added a comment explaining that GetContentHash returns a 16-byte hash, making it safe to read the first 4 bytes as an int. Co-authored-by: CyrusNajmabadi <4564579+CyrusNajmabadi@users.noreply.github.com>
…text-comparer-issue
RikkiGibson
left a comment
There was a problem hiding this comment.
LGTM, comment is not blocking.
| // Create source texts from byte arrays where one has a BOM and the other doesn't | ||
| const string content = "Test content"; | ||
|
|
||
| var utf8WithBOM = new UTF8Encoding(encoderShouldEmitUTF8Identifier: true); |
There was a problem hiding this comment.
It looks like this flag only controls the return value of GetPreamble(). You could consider deleting the UTF8Encoding locals and just using Encoding.UTF8 in this test.
Fix SourceTextComparer to use content-based equality consistently
Problem
SourceTextComparer violates the IEqualityComparer contract:
Equalsuses content-based comparison (viaContentEquals)GetHashCodeuses checksum which includes encoding and original bytesSolution
Change
GetHashCodeto useSourceText.GetContentHash()instead of the checksum-based approach. This makes both methods strictly content-based.Changes
GetHashCode()to useGetContentHash()withMemoryMarshal.Read<int>to efficiently read the first 4 bytes of the 16-byte hash as the hash codeTesting Results
This also fixes
SyntaxTreeComparerand improvesSourceTextValueProvidercaching behavior in analyzers and incremental generators.Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.