Eliminating Duplicate Values in Ruby Arrays

Arrays in Ruby provide flexible, ordered data storage. But their permissiveness of duplicates elements can cause issues. Here we‘ll thoroughly explore removing these repetitions.

Array Review and Duplicates

First, a quick refresher on Ruby arrays as ordered, integer-indexed collections:

# Create new array
names = ["Bob", "Mary", "Sam", "Mary"]  

# Access elements by index  
names[0] # => "Bob"
names[1] # => "Mary"

Unlike sets/hashes, arrays allow duplicate entries like multiple "Mary" values above. Now let‘s systematically inspect de-duplication approaches.

Default Behavior: Indexed by Value

Ruby arrays index elements based on order and make no uniqueness guarantees. The language specification covers this default behavior allowing repetitions.

So relying on arrays for distinct values requires manual de-duping. More on that soon! But first, let‘s contrast with sets and hashes…

Sets and Hashes Prevent Duplicates

Ruby‘s sets and hashes on the other hand provide uniqueness constraints:

require ‘set‘

names_set = Set["Bob", "Mary", "Sam", "Mary"]
# => #<Set: {"Bob", "Mary", "Sam"}> 

symbols = {:bob => 1, :mary => 2} 
# => {:bob=>1, :mary=>2}

Here attempting to insert the same value twice has no effect. The Set and Hash data structures handle eliminating duplication internals.

But for indexes access, arrays still reign supreme. So let‘s tackle removing duplicate values within arrays next.

The .uniq Method: One-Line De-Duping

Ruby provides a convenient array method called .uniq used for removing duplicate elements:

names = ["Bob", "Mary", "Sam", "Mary"]  

names.uniq 
# => ["Bob", "Mary", "Sam"]

By calling .uniq on an Array instance, we efficiently strip out any duplicate values in O(n) linear runtime. This returns a new array containing only unique elements from the original.

.uniq Leaves Original Array Unchanged

An important behavior to note – the call doesn‘t mutate or modify our existing array:

names # => ["Bob", "Mary", "Sam", "Mary"]

Instead .uniq returns a new de-duplicated array object leaving the argument array intact.

Leveraging Hashes Internally

Under Ruby‘s hood, .uniq actually constructs a Hash table mapping elements to occurrence counts internally:

{
  "Bob" => 1,
  "Mary"  => 2, 
  "Sam" => 1
}

It populates this structure tracking duplicates efficiently in O(n) time.

Finally, it extracts the unique hash keys into a fresh array lacking any repetitions.

Let‘s explore an in-place mutation approach next…

Destructive De-Duping with .uniq!

The bang version .uniq! performs the de-duplication in-place by mutating the caller:

names = ["Bob", "Mary", "Sam", "Mary"]  

names.uniq!
# => ["Bob", "Mary", "Sam"]

names 
# => ["Bob", "Mary", "Sam"]

By modifying the existing array to eliminate dups, we conserve memory avoiding any new allocations.

Space Savings with Potential Tradeoffs

However, directly changing application state in this manner can introduce risk:

Can lead to accidental shared mutation bugs
Obfuscates data flow / transformations
Increases cognitive load tracking state changes

For these reasons, .uniq is often preferred for its safety and explicitness even with extra object creation.

Performance: Slower in Early Ruby Versions

It‘s also worth noting that before Ruby 2.5, the mutable .uniq! approach was actually slower than the .uniq return value.

The reason is early Ruby VMs optimized .uniq via fast C-extensions leveraging hash table lookups as we saw internally.

But recent optimizing compilers can handle mutative algorithms like .uniq! more efficiently.

Benchmarking Unique Array Extractions

We can empirically compare performance with Ruby‘s benchmark library:

require ‘benchmark‘

arr = [1,2,3,1,5,6,1,3,4,5,7] * 10_000

Benchmark.bm do |x|
  x.report("uniq") { arr.uniq }
  x.report("uniq!") { arr.dup.uniq! }
end

# Sample Results
#
#        user     system      total        real
# uniq   0.010000   0.000000   0.010000 (  0.00902)
# uniq!  0.050000   0.010000   0.060000 (  0.05893)

Here we see .uniq clocks in around 5-6x faster on modern Ruby 3.1.2 likely thanks to under-the-hood Set optimizations.

So in addition to safer semantics, .uniq often provides speed advantages as well. But let explore more granular custom de-duping behavior next.

Custom De-Duplication Logic via Blocks

While .uniq removes all exact duplicate elements, we can pass a block to explicitly handle cases more granularly:

mixed = [1, "1", 2, "2"] 

mixed.uniq { |element| element.class }   
# => [1, "1"]

This leverages the Object#class method on array values to generate a class-based uniqueness hash internally similar to:

{
  Integer => 1, 
  String => "1"
}

By hashing based on class, we de-dupe Numbers from Strings here keeping the first of each.

More Targetted Deduplication

We can implement any custom logic in these blocks though:

names = ["Bob", "bobby", "Bob", "rob"]

names.uniq { |name| name.downcase }
# => ["Bob", "bobby"]

Here we uniquify Names ignoring case sensitivity by lowercasing each element before lookup. The sky is the limit for customization!

Imperative Looping for Manual De-Duping

Of course, we can also iterate manually without Array methods to filter down to unique elements.

For example, here is a basic Ruby approach leveraging conditional deletion:

arr = ["a", "b", "a", "c"]

arr.each_with_index do |item, i|
  arr.delete_at(i) if arr.count(item) > 1  
end

arr # => ["b", "c"]

This iterates each value, counting occurrences, deleting duplicates entries if above 1.

However, manual looping runs in quadratic O(n²) time complexity with nested array traversal instead of the faster specialized methods.

Sets: Automatic Uniqueness

For lightweight storage specifically optimized for distinct values, Ruby‘s Set class guarantees element uniqueness:

require ‘set‘

names = Set["Bob", "Mary", "Sam", "Mary"]   
# => #<Set: {"Bob", "Mary", "Sam"}>

The Set data structure internally leverages hash-based lookups to ensure additions of existing values have no effect.

However, Sets lack exhaustive Array methods, index ordering, and element immutability guarantees. So de-duping arrays specifically still proves essential.

Summary: Keeping Ruby Arrays Unique

As we‘ve seen, Ruby arrays allow duplicate entries by default unlike Sets and Hashes. Removing these repetitive values requires manual de-duplication.

The .uniq and .uniq! methods provide easy one-liners for eliminating duplication in O(n) linear runtime by leveraging performant hash lookups internally.

Passing custom blocks gives more fine-grained control over uniqueness logic as well. And manual approaches like delete iteration open further customization at the cost of efficiency.

So when dealing with duplicate array values in Ruby, consider .uniq and .uniq! along with blocks and alternative structures like Sets depending on semantics, performance, and safety needs.

Now you have an exhaustive set of techniques for keeping your collections uniquely pristine!

Eliminating Duplicate Values in Ruby Arrays

Array Review and Duplicates

Default Behavior: Indexed by Value

Sets and Hashes Prevent Duplicates

The .uniq Method: One-Line De-Duping

.uniq Leaves Original Array Unchanged

Leveraging Hashes Internally

Destructive De-Duping with .uniq!

Space Savings with Potential Tradeoffs

Performance: Slower in Early Ruby Versions

Benchmarking Unique Array Extractions

Custom De-Duplication Logic via Blocks

More Targetted Deduplication

Imperative Looping for Manual De-Duping

Sets: Automatic Uniqueness

Summary: Keeping Ruby Arrays Unique

Changing File Permissions Recursively in Linux

Comprehensive Guide to Adding Columns in PySpark DataFrames

The Top 5 Cryptocurrencies Worth Mining with a Raspberry Pi

How to Instantiate an Object in Java

Installing Packages on Debian 11: A Comprehensive 2600+ Word Guide

The Complete Guide to Directly Messaging Midjourney‘s Discord Bot

Linuxhaxor.net – About Open Source & Linux

Array Review and Duplicates

Default Behavior: Indexed by Value

Sets and Hashes Prevent Duplicates

The .uniq Method: One-Line De-Duping

.uniq Leaves Original Array Unchanged

Leveraging Hashes Internally

Destructive De-Duping with .uniq!

Space Savings with Potential Tradeoffs

Performance: Slower in Early Ruby Versions

Benchmarking Unique Array Extractions

Custom De-Duplication Logic via Blocks

More Targetted Deduplication

Imperative Looping for Manual De-Duping

Sets: Automatic Uniqueness

Summary: Keeping Ruby Arrays Unique

Related posts:

Similar Posts

Linuxhaxor.net – About Open Source & Linux