Arrays in Ruby provide flexible, ordered data storage. But their permissiveness of duplicates elements can cause issues. Here we‘ll thoroughly explore removing these repetitions.

Array Review and Duplicates

First, a quick refresher on Ruby arrays as ordered, integer-indexed collections:

# Create new array
names = ["Bob", "Mary", "Sam", "Mary"]  

# Access elements by index  
names[0] # => "Bob"
names[1] # => "Mary"

Unlike sets/hashes, arrays allow duplicate entries like multiple "Mary" values above. Now let‘s systematically inspect de-duplication approaches.

Default Behavior: Indexed by Value

Ruby arrays index elements based on order and make no uniqueness guarantees. The language specification covers this default behavior allowing repetitions.

So relying on arrays for distinct values requires manual de-duping. More on that soon! But first, let‘s contrast with sets and hashes…

Sets and Hashes Prevent Duplicates

Ruby‘s sets and hashes on the other hand provide uniqueness constraints:

require ‘set‘

names_set = Set["Bob", "Mary", "Sam", "Mary"]
# => #<Set: {"Bob", "Mary", "Sam"}> 

symbols = {:bob => 1, :mary => 2} 
# => {:bob=>1, :mary=>2}

Here attempting to insert the same value twice has no effect. The Set and Hash data structures handle eliminating duplication internals.

But for indexes access, arrays still reign supreme. So let‘s tackle removing duplicate values within arrays next.

The .uniq Method: One-Line De-Duping

Ruby provides a convenient array method called .uniq used for removing duplicate elements:

names = ["Bob", "Mary", "Sam", "Mary"]  

names.uniq 
# => ["Bob", "Mary", "Sam"]

By calling .uniq on an Array instance, we efficiently strip out any duplicate values in O(n) linear runtime. This returns a new array containing only unique elements from the original.

.uniq Leaves Original Array Unchanged

An important behavior to note – the call doesn‘t mutate or modify our existing array:

names # => ["Bob", "Mary", "Sam", "Mary"]

Instead .uniq returns a new de-duplicated array object leaving the argument array intact.

Leveraging Hashes Internally

Under Ruby‘s hood, .uniq actually constructs a Hash table mapping elements to occurrence counts internally:

{
  "Bob" => 1,
  "Mary"  => 2, 
  "Sam" => 1
}

It populates this structure tracking duplicates efficiently in O(n) time.

Finally, it extracts the unique hash keys into a fresh array lacking any repetitions.

Let‘s explore an in-place mutation approach next…

Destructive De-Duping with .uniq!

The bang version .uniq! performs the de-duplication in-place by mutating the caller:

names = ["Bob", "Mary", "Sam", "Mary"]  

names.uniq!
# => ["Bob", "Mary", "Sam"]

names 
# => ["Bob", "Mary", "Sam"] 

By modifying the existing array to eliminate dups, we conserve memory avoiding any new allocations.

Space Savings with Potential Tradeoffs

However, directly changing application state in this manner can introduce risk:

  • Can lead to accidental shared mutation bugs
  • Obfuscates data flow / transformations
  • Increases cognitive load tracking state changes

For these reasons, .uniq is often preferred for its safety and explicitness even with extra object creation.

Performance: Slower in Early Ruby Versions

It‘s also worth noting that before Ruby 2.5, the mutable .uniq! approach was actually slower than the .uniq return value.

The reason is early Ruby VMs optimized .uniq via fast C-extensions leveraging hash table lookups as we saw internally.

But recent optimizing compilers can handle mutative algorithms like .uniq! more efficiently.

Benchmarking Unique Array Extractions

We can empirically compare performance with Ruby‘s benchmark library:

require ‘benchmark‘

arr = [1,2,3,1,5,6,1,3,4,5,7] * 10_000

Benchmark.bm do |x|
  x.report("uniq") { arr.uniq }
  x.report("uniq!") { arr.dup.uniq! }
end

# Sample Results
#
#        user     system      total        real
# uniq   0.010000   0.000000   0.010000 (  0.00902)
# uniq!  0.050000   0.010000   0.060000 (  0.05893)

Here we see .uniq clocks in around 5-6x faster on modern Ruby 3.1.2 likely thanks to under-the-hood Set optimizations.

So in addition to safer semantics, .uniq often provides speed advantages as well. But let explore more granular custom de-duping behavior next.

Custom De-Duplication Logic via Blocks

While .uniq removes all exact duplicate elements, we can pass a block to explicitly handle cases more granularly:

mixed = [1, "1", 2, "2"] 

mixed.uniq { |element| element.class }   
# => [1, "1"]

This leverages the Object#class method on array values to generate a class-based uniqueness hash internally similar to:

{
  Integer => 1, 
  String => "1"
}

By hashing based on class, we de-dupe Numbers from Strings here keeping the first of each.

More Targetted Deduplication

We can implement any custom logic in these blocks though:

names = ["Bob", "bobby", "Bob", "rob"]

names.uniq { |name| name.downcase }
# => ["Bob", "bobby"] 

Here we uniquify Names ignoring case sensitivity by lowercasing each element before lookup. The sky is the limit for customization!

Imperative Looping for Manual De-Duping

Of course, we can also iterate manually without Array methods to filter down to unique elements.

For example, here is a basic Ruby approach leveraging conditional deletion:

arr = ["a", "b", "a", "c"]

arr.each_with_index do |item, i|
  arr.delete_at(i) if arr.count(item) > 1  
end

arr # => ["b", "c"]

This iterates each value, counting occurrences, deleting duplicates entries if above 1.

However, manual looping runs in quadratic O(n2) time complexity with nested array traversal instead of the faster specialized methods.

Sets: Automatic Uniqueness

For lightweight storage specifically optimized for distinct values, Ruby‘s Set class guarantees element uniqueness:

require ‘set‘

names = Set["Bob", "Mary", "Sam", "Mary"]   
# => #<Set: {"Bob", "Mary", "Sam"}>

The Set data structure internally leverages hash-based lookups to ensure additions of existing values have no effect.

However, Sets lack exhaustive Array methods, index ordering, and element immutability guarantees. So de-duping arrays specifically still proves essential.

Summary: Keeping Ruby Arrays Unique

As we‘ve seen, Ruby arrays allow duplicate entries by default unlike Sets and Hashes. Removing these repetitive values requires manual de-duplication.

The .uniq and .uniq! methods provide easy one-liners for eliminating duplication in O(n) linear runtime by leveraging performant hash lookups internally.

Passing custom blocks gives more fine-grained control over uniqueness logic as well. And manual approaches like delete iteration open further customization at the cost of efficiency.

So when dealing with duplicate array values in Ruby, consider .uniq and .uniq! along with blocks and alternative structures like Sets depending on semantics, performance, and safety needs.

Now you have an exhaustive set of techniques for keeping your collections uniquely pristine!

Similar Posts