Ruby Array collect Operation: Practical Patterns, Pitfalls, and Performance

Last month I reviewed a Ruby service that "looked" clean: short methods, nice names, a few tidy loops. But production was slow and memory spikes were embarrassing. The root cause was not a database query or a fancy gem. It was a pile of array transformations written in a way that quietly created extra arrays, mixed side effects into data transforms, and hid nils that should have been errors. The fix was mostly about getting disciplined with one tool: Array#collect.

If you write Ruby for real systems, you spend a lot of time turning data into other data: log lines into structured events, rows into domain objects, payloads into API responses, and feature flags into decisions. collect is the workhorse for that job. It is simple, but it is not "basic" in the sense of "no sharp edges."

I will walk you through what collect really does, how I shape blocks so the intent stays obvious, how it compares to other iteration methods, what I do for common real-world tasks, and where performance surprises show up. I will also share a few 2026-era workflow habits (tests, linting, and AI-assisted reviews) that keep mapping code boring – in a good way.

What collect actually does (and what it does not)

At its core, Array#collect runs the given block once per element and returns a new array containing the block‘s return values.

  • Input: an array
  • Operation: yield each element to a block
  • Output: a new array of whatever the block returns

That "whatever the block returns" is the key detail. collect is not "modify each element." It is "build a new array from the block‘s results." If your block returns nil, you will get nil in the output array.

Here is a runnable example with integers:

# frozenstringliteral: true

numbers = [1, 2, 3, 4]

incremented = numbers.collect {

n

n + 1 }

shifted = numbers.collect {

n

n – 5 * 7 }

puts "numbers : #{numbers.inspect}"

puts "incremented : #{incremented.inspect}"

puts "shifted : #{shifted.inspect}"

And with strings:

# frozenstringliteral: true

words = ["cat", "rat", "geeks"]

excited = words.collect {

w

w + "!" }

tagged = words.collect {

w

w + "_at" }

puts "words : #{words.inspect}"

puts "excited : #{excited.inspect}"

puts "tagged : #{tagged.inspect}"

A few behaviors I always keep in mind:

1) collect returns a new array and does not change the original.

2) If you do not pass a block, collect returns an Enumerator, which is handy for composition.

numbers = [10, 20, 30]

enum = numbers.collect

puts enum.class # Enumerator

p enum.with_index {

n, i

n + i } # => [10, 21, 32]

3) collect is an alias of map in Ruby. I still see both in codebases. I recommend picking one as a team convention. Personally, I default to map in most new code because it reads like a verb (map this to that), but I am perfectly happy when legacy code consistently uses collect.

There is also a fourth behavior worth calling out because it affects how you design APIs: collect preserves order and preserves length. That sounds obvious, but it is why I reach for it when the output must line up with the input (for example, when I am transforming an array of rows that must remain index-aligned with a parallel array).

Blocks that stay readable under pressure

collect lives or dies by the block. When the block is clear, the transformation is clear. When the block is clever, the code becomes a puzzle.

Keep the return value obvious

Because collect records the block‘s return value, I try to make that return value the last expression in a short block.

# Good: last expression is the transformed value

sanitized = emails.collect {

email

email.strip.downcase }

If logic gets longer, I switch to do ... end and add a couple of names so future-me does not have to re-parse it.

sanitized = emails.collect do

email

cleaned = email.strip.downcase

cleaned.gsub(/\s+/, "")

end

I also lean on guard clauses inside the block so the "happy path" stays visually dominant.

normalized = users.collect do

user

email = user.email

raise "User #{user.id} missing email" if email.nil?

email.strip.downcase

end

This is a micro-style choice, but it consistently lowers cognitive load: the transformed value is always the last line.

Avoid side effects in collect

If your block mutates external state (pushing into another array, incrementing counters, logging), you are mixing "transform" and "do stuff." That is when bugs sneak in.

If you truly want side effects, I reach for each.

# Side effects belong in each

errors = []

users.each do

user

errors << "missing email for #{user.id}" if user.email.nil?

end

If you want a transformed array and also want to log, I keep the transformation pure and log after, or I wrap the side effect carefully with intent.

normalized = users.collect do

user

normalized_email = user.email&.strip&.downcase

if normalized_email.nil?

# I would rather raise here than silently insert nils into a response.

raise "User #{user.id} missing email"

end

normalized_email

end

A more subtle side effect is mutating the element itself inside the block. That can be OK when the element is a mutable object (like a hash) and you are intentionally normalizing it. But it is easy to accidentally mutate shared objects (for example, memoized constants or cached objects). If you mutate inside collect, I want that to be unmistakable in the code.

Use pattern matching and destructuring when it improves clarity

Modern Ruby makes it easy to destructure arrays and hashes. When the input elements are tuples, destructuring can make your block cleaner.

pairs = [["alice", 3], ["bruno", 7], ["carla", 2]]

labels = pairs.collect do

(name, count)

"#{name}: #{count}"

end

p labels # ["alice: 3", "bruno: 7", "carla: 2"]

The litmus test I use: if destructuring reduces indexing like pair[0] and pair[1], it is usually worth it.

For hashes, I often use a light destructure pattern when it reads cleanly, but I avoid going overboard. A big destructure line can hide the actual mapping.

Symbol-to-proc is fine, but do not force it

collect(&:strip) is pleasant. collect(&:somelongchain) is not.

names = [" Ada ", "Grace ", " Linus"]

p names.collect(&:strip) # ["Ada", "Grace", "Linus"]

Once you need conditions, error handling, or multiple steps, a normal block reads better.

One extra note: I am cautious with &:method if the array can contain nil values. nil.strip will explode. That is often good (fail fast), but it should be intentional.

Picking the right iterator: collect vs each, select, reduce, and friends

I will be opinionated here because it saves time in code reviews.

  • Use collect/map when you want a 1-to-1 transformation: one input element produces one output element.
  • Use select/filter when you want to keep or drop elements.
  • Use reject when you want to drop elements based on a predicate.
  • Use reduce/inject when you want to fold into a single value (sum, hash, object).
  • Use flat_map when each element can produce zero, one, or many outputs and you want a single flattened array.

This is not just style. It is correctness. Each method communicates a contract about the shape of the output.

The classic bug: using collect to filter

I see this mistake all the time:

# Buggy: produces nils

active_ids = users.collect {

u

u.active? ? u.id : nil }

Now active_ids contains nil entries. That might still "work" until somebody uses it in SQL (WHERE id IN (...)) or builds a JSON response.

Prefer:

active_ids = users.select(&:active?).collect(&:id)

Or (my favorite in many cases):

activeids = users.filtermap {

u

u.id if u.active? }

filter_map is a great fit when you want to map and filter in one pass.

collect vs flat_map

If your block returns arrays and you want a flat result, use flat_map.

orders = [

{ id: 1001, items: ["keyboard", "mouse"] },

{ id: 1002, items: ["monitor"] }

]

# Wrong shape: an array of arrays

nested = orders.collect {

o

o[:items] }

# Right shape: one flat array of items

allitems = orders.flatmap {

o

o[:items] }

p nested # [["keyboard", "mouse"], ["monitor"]]

p all_items # ["keyboard", "mouse", "monitor"]

I also watch for this pattern:

items = orders.collect {

o

o[:items] }.flatten

It is readable, but it allocates an intermediate nested array and then allocates again during flattening. flat_map communicates the intent and can be more efficient.

collect vs reduce for hashes

If you are building a hash keyed by something, I recommend toh or eachwith_object over reduce, because they read more directly.

users = [

{ id: 10, email: "[email protected]" },

{ id: 11, email: "[email protected]" }

]

byid = users.collect {

u

[u[:id], u] }.toh

p by_id[10] # {:id=>10, :email=>"[email protected]"}

If the transformation is more involved, eachwithobject({}) stays clear.

byemail = users.eachwith_object({}) do

u, acc

acc[u[:email]] = u

end

When I do see reduce({}) used for hash building, it is often fine, but it tends to grow into longer blocks faster. I treat eachwithobject as the "boring" default.

Real-world collect patterns I reach for weekly

1) Converting external payloads into a stable shape

When data comes from the outside world, I normalize early so the rest of the app does not have to remember edge cases.

# frozenstringliteral: true

payload = [

{ "user_id" => "42", "email" => " [email protected] ", "roles" => ["admin", "billing"] },

{ "user_id" => "43", "email" => "", "roles" => [] }

]

users = payload.collect do

row

email = row.fetch("email", "").strip.downcase

if email.empty?

# I fail fast here. Silent nils turn into slow incidents later.

raise "Invalid email for userid=#{row["userid"].inspect}"

end

{

id: Integer(row.fetch("user_id")),

email: email,

roles: row.fetch("roles", []).collect(&:to_s)

}

end

p users

The key idea: collect is a good place to normalize types (strings to integers, trimming, downcasing) as long as you keep the block honest about failures.

A practical detail I add in production code is context-rich error messages. I want to know which payload row failed, not just that something failed.

2) Formatting view models / API response rows

In web work, I often build lightweight hashes for JSON output.

# frozenstringliteral: true

Product = Struct.new(:id, :name, :pricecents, :instock)

products = [

Product.new(1, "Mechanical Keyboard", 12999, true),

Product.new(2, "USB-C Cable", 1999, false)

]

response = products.collect do

p

{

id: p.id,

name: p.name,

price: format("$%.2f", p.price_cents / 100.0),

availability: p.instock ? "instock" : "backorder"

}

end

require "json"

puts JSON.pretty_generate(response)

I keep this kind of mapping close to the boundary layer (controller/serializer). Deep domain code should usually return domain objects, not response hashes.

A small habit that pays off: I keep key names consistent across response builders (id, created_at, status etc.). collect is often where inconsistency is introduced.

3) Parsing and transforming text data

If you process logs or CSV rows, collect is a natural fit.

lines = [

"2026-02-03T14:12:00Z level=INFO requestid=abc123 durationms=18",

"2026-02-03T14:12:01Z level=ERROR requestid=def456 durationms=240"

]

events = lines.collect do

line

parts = line.split

{

ts: parts[0],

level: parts[1].split("=").last,

request_id: parts[2].split("=").last,

duration_ms: Integer(parts[3].split("=").last)

}

end

p events

This is intentionally simple; for production parsing you would likely want more robust parsing and validation, but the mapping approach stays the same.

Where I see teams get burned is assuming the input is always clean. For log lines, I often do:

  • Validate the number of fields.
  • Parse with a small helper (parsekvpairs) so the mapping block stays readable.
  • Fail with an error that includes the original line.

4) Transforming nested arrays with collect + flat_map

A common workflow is: map, then flatten, then map again.

teams = [

{ name: "Payments", members: [{ name: "Amina" }, { name: "Jon" }] },

{ name: "Search", members: [{ name: "Priya" }] }

]

member_names = teams

.flat_map {

t

t[:members] }

.collect {

m

m[:name] }

p member_names # ["Amina", "Jon", "Priya"]

If you see collect returning arrays and then a separate .flatten, I nearly always replace it with flat_map.

5) Working with indexes cleanly

If you need the index, do not manually count.

names = ["Amina", "Jon", "Priya"]

numbered = names.collect.with_index(1) do

name, i

"#{i}. #{name}"

end

p numbered

That with_index(1) detail matters: I like one-based numbering for user-facing output, and it stays explicit.

6) Normalizing optional nested fields

A pattern I see constantly: nested data where some keys may be missing. The trick is to avoid a mess of &. chains that hide business rules.

records = [

{ "id" => "1", "profile" => { "tags" => ["New", "VIP"] } },

{ "id" => "2", "profile" => nil },

{ "id" => "3" }

]

normalized = records.collect do

r

tags = r.dig("profile", "tags") || []

{

id: Integer(r.fetch("id")),

tags: tags.collect {

t

t.to_s.strip.downcase }.reject(&:empty?)

}

end

The mapping is still readable because the optionality is handled up front (dig + fallback), then we do a simple collect for the tag transform.

7) Turning objects into lookup tables (and back)

I often use collect to build a table, then map through it.

# Build lookup

bycode = products.collect {

p

[p.code, p] }.toh

# Resolve codes into products

resolved = codes.collect do

code

product = by_code[code]

raise "Unknown product code=#{code.inspect}" if product.nil?

product

end

This is one of the cleanest ways to keep collect blocks short: do a preprocessing step once, then map fast.

Common mistakes (and what I do instead)

Mistake 1: Forgetting that collect returns what the block returns

This one hides in plain sight:

values = [1, 2, 3]

result = values.collect do

n

puts "processing #{n}" # puts returns nil

end

p result # [nil, nil, nil]

If you want logging while also returning values, return the value explicitly.

result = values.collect do

n

puts "processing #{n}"

n * 10

end

p result # [10, 20, 30]

When the block gets longer, I sometimes make the return explicit by assigning to a name and ending with that name. It is not required in Ruby, but it makes the intention unmissable.

Mistake 2: Using collect! (or mutating elements) without meaning to

Ruby also has collect! (and map!), which mutates the array in place.

That can be correct, but in services and libraries I treat it as a "sharp tool." Mutating in place makes code harder to reason about and can cause surprising bugs if the array is shared.

numbers = [1, 2, 3]

numbers.collect! {

n

n + 1 }

p numbers # [2, 3, 4]

When I do use it:

  • The array is local.
  • The method name signals mutation (or the call site is very obvious).
  • I care about reducing temporary allocations in a hot path.

If you are not sure, default to non-mutating collect.

Mistake 3: Returning inconsistent types

I have debugged bugs where half the array contains strings and the other half contains integers because the block had branching returns.

raw = ["42", "", "17"]

# Risky: returns either Integer or nil

ids = raw.collect {

s

s.empty? ? nil : Integer(s) }

p ids # [42, nil, 17]

If empty strings are invalid, I raise. If they are expected, I use filter_map so the output is clean.

ids = raw.filter_map do

s

next if s.empty?

Integer(s)

end

p ids # [42, 17]

Mistake 4: Doing expensive work inside the block repeatedly

If the block compiles a regex, parses a template, or hits the filesystem, you are paying that cost N times.

Bad:

sanitized = inputs.collect {

s

s.gsub(Regexp.new("\\s+"), " ") }

Better:

whitespace = /\s+/

sanitized = inputs.collect {

s

s.gsub(whitespace, " ") }

That one change can shave noticeable time in a tight loop.

A more realistic version of this mistake is repeatedly parsing JSON or repeatedly creating a formatter:

  • Create parsers/formatters once.
  • Pass them in.
  • Keep the mapping block focused on mapping.

Mistake 5: Hiding business decisions behind safe navigation

Safe navigation (&.) is great, but it can quietly turn bad data into nils.

emails = users.collect {

u

u.profile&.email&.strip&.downcase }

If missing emails are an error, this code produces nils and the problem surfaces later. In boundary code, I often prefer a deliberate check:

emails = users.collect do

u

email = u.profile&.email

raise "Missing email for user #{u.id}" if email.nil? || email.strip.empty?

email.strip.downcase

end

That is longer, but it is honest.

Performance and memory: what matters in real apps

collect is fast enough for the vast majority of code. Still, there are three recurring performance themes.

1) collect allocates a new array

If your array has 100,000 elements, you are creating a new array of 100,000 elements. That is expected. If you chain multiple collect calls, you create multiple intermediate arrays.

For small arrays, I do not care. For large arrays in a hot path, I try to reduce intermediate allocations by combining steps.

Instead of:

emails = users.collect {

u

u.email }

clean = emails.collect {

e

e.strip.downcase }

I prefer:

clean = users.collect {

u

u.email.strip.downcase }

Or if nils are possible:

clean = users.filter_map do

u

email = u.email

next if email.nil?

email.strip.downcase

end

One of the easiest wins is to stop building arrays you do not need. If the result is only used to update counters, emit metrics, or write output incrementally, an eager collect can be wasteful.

2) Chaining can be clear, but do not build pipelines that hide costs

I like readable pipelines, but I keep the cost in mind:

# Readable, but creates intermediate arrays for each step

result = items

.collect {

i

i.trimmed_name }

.reject(&:empty?)

.collect(&:upcase)

If items is big and you care about memory, you can use a lazy enumerator.

result = items

.lazy

.collect {

i

i.trimmed_name }

.reject(&:empty?)

.collect(&:upcase)

.force

The trade-off: lazy pipelines can be a little harder to debug. I use them when I have evidence (profiling, memory graphs, latency) that the eager version is too expensive.

Also note the shape: Array#collect returns an array, but Enumerator::Lazy#collect returns a lazy enumerator until you force. That difference matters when you are refactoring – it is easy to accidentally return a lazy enumerator from a method that callers expect to be an array.

3) Prefer algorithmic wins over micro-tuning

If you are mapping and then repeatedly looking up items in an array, switching to a hash lookup can dominate any micro improvement around collect.

Example: building a lookup table once.

# Turn records into a lookup hash once

byid = records.collect {

r

[r.id, r] }.toh

# Then map inputs through the lookup

resolved = ids.collect do

id

record = by_id[id]

raise "Unknown id=#{id}" if record.nil?

record

end

That is the kind of change that moves latency from "noticeable" to "boring."

4) When collect! is actually a performance tool

In hot code, avoiding intermediate arrays can matter, and collect! can help. But I only reach for it after I have done the bigger things:

  • Avoid repeated work in the block.
  • Avoid multi-pass pipelines.
  • Avoid avoidable allocations (like temporary strings).

If those are already addressed and the mapping is truly in a hot loop, collect! can be worth it. When I do this, I keep the scope tight so the mutation is not surprising.

5) Strings, freezing, and accidental churn

A lot of memory spikes I see are string churn. collect itself is not the villain; it just makes it easy to create N new strings quickly.

Examples of churn patterns:

  • Repeated + concatenation that creates multiple intermediate strings.
  • Repeated gsub chains.
  • Converting the same value multiple times.

Sometimes the simplest improvement is to reduce the number of string operations per element, or to precompute constants outside the block.

When I use collect, when I avoid it, and what I recommend instead

Here is the mental model I use in reviews.

Use collect when:

  • You need a 1-to-1 transformation.
  • The block is short and pure (no side effects).
  • The output array is actually needed (you are going to store it, return it, or pass it to another API that expects an array).
  • Order matters and should match input order.

Typical examples:

  • Convert User objects to response hashes.
  • Convert string IDs to integers.
  • Normalize formatting (strip, downcase) across a list.

Avoid collect when:

  • You are not using the returned array.
  • You are doing side effects (logging, pushing into another array, mutating global state).
  • You actually want filtering (use select, reject, or filter_map).
  • You want one result (use reduce, sum, min, max, etc.).
  • You want a streaming pipeline (use each, lazy, or an enumerator depending on the source).

I treat "I used collect because I needed to iterate" as a smell. Ruby gives you a lot of iteration tools; pick the one that describes your intent.

A quick decision table

  • "One in, one out" -> collect/map
  • "Keep some" -> select/filter
  • "Drop some" -> reject
  • "Map + drop nils" -> filter_map
  • "Many out" -> flat_map
  • "Build a hash" -> toh, eachwith_object({})
  • "Just do effects" -> each

This is not about being clever. It is about making code review faster because the method name tells you the shape.

Nils, errors, and designing your mapping contract

This is the part I wish more Ruby teams discussed explicitly: what is the contract of your mapping?

When you write:

emails = users.collect {

u

u.email }

You have implicitly agreed that every user has an email, or you are okay with nils. If that is true, great. If it is not true, you have just created a delayed bug.

Three common contracts (pick one on purpose)

1) "Nils are valid data"

This happens when nil has meaning (for example, optional fields in a response). In this case, collect returning nil is fine, but I still want naming that communicates it.

optional_emails = users.collect {

u

u.email }

2) "Missing data should be dropped"

Then use filter_map.

emails = users.filter_map {

u

u.email&.strip&.downcase }

3) "Missing data is a bug"

Then raise.

emails = users.collect do

u

email = u.email

raise "Missing email for user #{u.id}" if email.nil?

email.strip.downcase

end

I prefer option (3) inside domain logic and option (2) near user-facing features where partial data is acceptable. The important part is that the choice is explicit.

Use fetch when missing keys are a bug

When mapping hashes, [] returns nil for missing keys. That can hide data drift.

# Risky: missing "user_id" becomes nil, then Integer(nil) blows up later

id = Integer(row["user_id"])

I prefer fetch when the key should exist:

id = Integer(row.fetch("user_id"))

If you want a default, give one deliberately:

roles = row.fetch("roles", [])

Prefer narrow, descriptive errors

When a mapping fails in production, I want the error to answer:

  • Which element failed?
  • Which field failed?
  • What was the raw value?

So I write errors like:

raise "Invalid duration_ms=#{raw.inspect} in line=#{line.inspect}"

It is not fancy, but it makes incidents shorter.

Composing collect with Enumerators (without surprising yourself)

collect returning an enumerator when you omit the block is one of those Ruby features that stays underused.

Building pipelines deliberately

If you have an array and you want to add an index later:

enum = items.collect

result = enum.with_index {

item, i

[i, item] }

This is nice because it composes without creating intermediate arrays until you actually collect.

Lazy composition: good, but be explicit at boundaries

I like lazy enumerators when the dataset is large and the pipeline is multi-step. But I always put a clear boundary where I convert back to an array.

def normalized_emails(users)

users

.lazy

.filter_map {

u

u.email }

.collect {

e

e.strip.downcase }

.force

end

That .force is not noise; it is a boundary marker. Without it, the caller might accidentally pass the lazy enumerator into something that expects an array.

A practical rule: do not return lazy enumerators by accident

If a method is named like it returns an array (emails, rows, events), return an array. If you want to return an enumerator, name it like an enumerator (eachemail, emailenum) and document it.

Refactoring mapping code without breaking behavior

The most common refactor I do is turning a multi-pass chain into a single-pass collect or filter_map.

Multi-pass -> single pass

Before:

result = users

.select(&:active?)

.collect {

u

u.email }

.collect {

e

e.strip.downcase }

After:

result = users.filter_map do

u

next unless u.active?

email = u.email

next if email.nil?

email.strip.downcase

end

This can reduce intermediate arrays and reads like "filter_map active users to normalized emails." The key is to keep it readable: short local names, clear guards, transformed value at the end.

Extract a mapper object when blocks get heavy

If the block grows past what fits comfortably on screen, I stop adding cleverness and extract.

class UserEmailNormalizer

def call(user)

email = user.email

raise "Missing email for user #{user.id}" if email.nil?

email.strip.downcase

end

end

normalizer = UserEmailNormalizer.new

emails = users.collect {

u

normalizer.call(u) }

This is not "more enterprise" – it is a way to keep collect readable while still doing real work.

Testing collect code so it stays boring

Mapping code fails in two main ways:

  • The shape changes (keys renamed, types changed).
  • Edge cases creep in (nil, empty strings, unexpected types).

So my tests for collect code target those failure modes.

Test the contract, not the implementation

If I map a payload to a stable shape, I write tests that assert:

  • Output length matches input length (when that is intended).
  • Required keys exist.
  • Types are correct (integer, string, array).
  • Invalid input raises a clear error.

Example (conceptually):

payload = [{ "user_id" => "42", "email" => " [email protected] " }]

users = normalize_payload(payload)

expect(users).to eq([{ id: 42, email: "[email protected]", roles: [] }])

And a failure case:

payload = [{ "user_id" => "42", "email" => "" }]

expect { normalizepayload(payload) }.to raiseerror(/Invalid email/)

Those tests act like a guardrail for future refactors.

Add one test for nil behavior (always)

If nils are allowed, I test that nils are allowed.

If nils are not allowed, I test that nils raise.

That one explicit test prevents half of the "why is this nil in production" issues.

Linting and style conventions that reduce mapping bugs

Tooling does not write code for you, but it can keep you from writing the same mistake repeatedly.

Team convention: pick map or collect

Since they are aliases, switching between them inside one codebase just adds noise. I recommend:

  • Use one name consistently.
  • In reviews, ask "is the mapping contract clear?" not "why did you use collect?"

Guard against unused results

One subtle bug is calling collect and ignoring the result (usually you meant each). Linters can catch that kind of thing depending on your setup, but even without a linter I treat it as a review rule:

  • If you call collect, the result should be assigned, returned, or passed along.

Keep blocks small and explicit

When a block has multiple responsibilities (validation + transform + side effects), I expect bugs. Refactor early:

  • Extract helper methods.
  • Extract mapper objects.
  • Use filter_map when you actually want filtering.

AI-assisted reviews in 2026 (how I use them without letting them get sloppy)

I do use AI to review mapping-heavy code, but I treat it like a very fast junior reviewer: good at pointing out patterns, not a source of truth.

My checklist when I ask for review on collect code:

  • "Where can this return nil unexpectedly?"
  • "Where are we doing extra passes that could be one pass?"
  • "Where do we allocate intermediate arrays that we do not need?"
  • "Where are we mutating shared objects inside a block?"

Then I validate with tests and profiling. The big win is not that AI finds deep truths; it is that it forces the code to be explainable.

Practical checklists I use in code review

When I review collect code, I run this mental checklist:

  • Does the method name communicate whether nils are allowed?
  • Is the block pure (or are there hidden side effects)?
  • Is the return value of the block obvious on the last line?
  • Is this really mapping, or is it filtering/folding/flattening?
  • Are we doing multi-pass transformations on a large array without evidence that it is fine?

If the code passes those questions, it is usually production-safe.

A final set of guidelines (the short version)

If you only remember a few things about Array#collect, make it these:

  • collect builds a new array from the block‘s return values. If the block returns nil, nil goes into the output.
  • Use collect for 1-to-1 transforms; use filtermap for map+filter; use flatmap for map+flatten.
  • Keep collect blocks pure and short. If you need side effects, use each.
  • Be explicit about nils: allow them, drop them, or raise. Do not let them leak in accidentally.
  • Watch for intermediate arrays in chained transforms on big datasets; combine steps or use lazy when you have evidence.

Once you treat collect as a contract – not just a loop – your mapping code becomes easier to read, easier to test, and much less likely to turn into a performance incident.

Scroll to Top