-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Select-Object -Unique is much slower than Sort-Object -Unique #7707
Copy link
Copy link
Closed
Labels
HacktoberfestPotential candidate to participate in HacktoberfestPotential candidate to participate in HacktoberfestIssue-Enhancementthe issue is more of a feature request than a bugthe issue is more of a feature request than a bugResolution-No ActivityIssue has had no activity for 6 months or moreIssue has had no activity for 6 months or moreUp-for-GrabsUp-for-grabs issues are not high priorities, and may be opportunities for external contributorsUp-for-grabs issues are not high priorities, and may be opportunities for external contributorsWG-Cmdlets-Utilitycmdlets in the Microsoft.PowerShell.Utility modulecmdlets in the Microsoft.PowerShell.Utility moduleWG-Engine-Performancecore PowerShell engine, interpreter, and runtime performancecore PowerShell engine, interpreter, and runtime performance
Metadata
Metadata
Assignees
Labels
HacktoberfestPotential candidate to participate in HacktoberfestPotential candidate to participate in HacktoberfestIssue-Enhancementthe issue is more of a feature request than a bugthe issue is more of a feature request than a bugResolution-No ActivityIssue has had no activity for 6 months or moreIssue has had no activity for 6 months or moreUp-for-GrabsUp-for-grabs issues are not high priorities, and may be opportunities for external contributorsUp-for-grabs issues are not high priorities, and may be opportunities for external contributorsWG-Cmdlets-Utilitycmdlets in the Microsoft.PowerShell.Utility modulecmdlets in the Microsoft.PowerShell.Utility moduleWG-Engine-Performancecore PowerShell engine, interpreter, and runtime performancecore PowerShell engine, interpreter, and runtime performance
Reading a file of ~60,000 lines, picking only unique entries:
(Relevant source code for sort-object unique handling and relevant source code for select-object -unique handling, appears to happen on PSv5.1 Windows and PSv6.1-preview 4 Linux).
I see that
Select-Objectstores a list of items it has seen, and has a nested loop to compare every incoming item against every item in the list, a full object compare instead of just the property being sorted on, and every added property. so it is doing more work, and should be expected to be slower. Even so, it is so much slower - for a case of 'unique strings' which seems like it would be common, but may not be - could it be sped up?Would it be reasonable to have it store a HashSet of something like
obj.ToString()as well, and then for each incoming object, lookup in the HashSet - if it's not there, then the object must be unique and new, and it can be output without further work. If the value is in the hashset, it can do the full comparison. Or would that be too much extra memory use?Using
sort-objectis a workaround if you don't mind the order changing, but if you tryselect-objectand think it's slow, sorting seems like it would add extra work on top and take longer - it's not obvious that it might be ~100x faster.