SQL UNION ALL: Keep Every Row, Keep Control

A few months ago I was helping a retail analytics team reconcile daily sales from three systems: in‑store POS, mobile checkout, and an older web storefront. Each system produced similar rows, but not identical. The data pipeline had been using UNION and, without anyone noticing, it was quietly dropping duplicates that were actually legitimate repeat purchases. The report looked “clean,” but revenue was short. I replaced UNION with UNION ALL, tightened the query structure, and the numbers lined up on the next run. That small change paid for itself in a single afternoon.

That’s the power of UNION ALL: it combines result sets fast, keeps every row, and avoids the hidden costs of duplicate removal. If you do any reporting, ETL, or analytics work, you’ll run into the question, “Should I remove duplicates or keep them?” I want you to be able to answer that confidently, with real SQL patterns you can run today. I’ll walk you through how UNION ALL behaves, when it’s the right choice, and how to prevent common mistakes that lead to silent data loss or ugly performance surprises.

What UNION ALL really does (and why that matters)

UNION ALL stacks result sets vertically. If you run two SELECT statements, their rows are concatenated into one output, in order. There’s no de‑duplication step, no implicit sorting, and no surprise filter. You get exactly what each SELECT produces. That’s why UNION ALL is typically faster than UNION: the database engine can stream each result without the extra work of comparing rows to remove duplicates.

I like to think of UNION ALL as stacking receipts on a counter. You don’t compare every receipt to every other receipt to remove “duplicates.” You just add them to the pile. That mental model makes it clear why performance improves: there’s no need for a distinct operation.

Syntax recap

You already know the basic shape, but it’s worth stating precisely:

SELECT col1, col2, col3

FROM source_a

UNION ALL

SELECT col1, col2, col3

FROM source_b;

Each SELECT must return the same number of columns and compatible data types for each position. Names don’t have to match, but I strongly recommend using aliases so the final output is readable.

De‑duplication is your responsibility

Because UNION ALL never removes duplicates, any duplicate control needs to be explicit. If you truly need de‑duplication, you can:

  • Switch to UNION (which performs distinct)
  • Wrap the UNION ALL inside a SELECT DISTINCT
  • Use a GROUP BY on a unique key

You should make that decision intentionally, not by accident.

UNION vs UNION ALL: a clear recommendation

You’re probably deciding between UNION and UNION ALL. Here is how I recommend you choose in practice:

  • Use UNION ALL by default when you are combining sources that are naturally disjoint or when duplicates are meaningful
  • Use UNION only when you are 100% sure duplicates should be removed and you want that baked into the query

Here’s a direct comparison:

Scenario

My recommendation

Why —

— Stacking partitioned tables (monthly tables, shards)

UNION ALL

Partitions are disjoint by design Combining event logs from different services

UNION ALL

Duplicate events are meaningful and should be investigated, not dropped Merging overlapping lookup tables

UNION (or DISTINCT)

Duplicates are typically noise and should be removed Deduplicating customer rows across sources

UNION ALL + explicit rule

You need deterministic logic, not implicit removal

If you are unsure, I recommend UNION ALL plus explicit de‑duplication logic. It keeps control in your hands and makes the query’s intent clear to anyone reading it later.

Real‑world patterns that rely on UNION ALL

Let me show you patterns I use weekly. Each one is runnable and designed for real data, not toy tables.

1) Appending time‑partitioned tables

Many teams store data in monthly tables for performance or retention reasons. To query a range, you stack the tables:

SELECT orderid, customerid, ordertotal, orderts

FROM orders202511

UNION ALL

SELECT orderid, customerid, ordertotal, orderts

FROM orders202512

UNION ALL

SELECT orderid, customerid, ordertotal, orderts

FROM orders202601

WHERE order_ts < '2026-02-01';

This is a clean use of UNION ALL because the partitions are intended to be mutually exclusive. In fact, if you used UNION here, you’d be paying extra cost for a problem that shouldn’t exist.

2) Combining event streams with a source tag

When you merge data from multiple systems, always include a source tag. It makes auditing and debugging far easier.

SELECT

event_id,

user_id,

event_type,

event_ts,

‘web‘ AS source

FROM web_events

UNION ALL

SELECT

event_id,

user_id,

event_type,

event_ts,

‘mobile‘ AS source

FROM mobile_events;

If you see duplicates later, you can quickly identify whether they are cross‑system duplicates or real repeated events.

3) Build a change log from operational tables

You can use UNION ALL to convert multiple operational tables into a single, consistent change log. This works well for audit dashboards.

SELECT

user_id,

‘profileupdate‘ AS changetype,

updatedat AS changets

FROM user_profiles

WHERE updated_at >= ‘2026-01-01‘

UNION ALL

SELECT

user_id,

‘passwordreset‘ AS changetype,

resetat AS changets

FROM password_resets

WHERE reset_at >= ‘2026-01-01‘

UNION ALL

SELECT

user_id,

‘emailverify‘ AS changetype,

verifiedat AS changets

FROM email_verifications

WHERE verified_at >= ‘2026-01-01‘;

This is a simple pattern with a big impact: you get a consolidated timeline without building a separate pipeline.

4) Recovering from duplicate removal mistakes

If a report used UNION but duplicates are valid, you can correct it without rebuilding the report from scratch.

WITH all_rows AS (

SELECT orderid, customerid, ordertotal, orderts

FROM pos_orders

UNION ALL

SELECT orderid, customerid, ordertotal, orderts

FROM mobile_orders

)

SELECT *

FROM all_rows

WHERE order_ts >= ‘2026-01-01‘;

Notice the CTE: it makes the change explicit and easy to review.

Common mistakes and how I avoid them

UNION ALL is simple, but the mistakes are subtle because SQL won’t always complain. Here are the ones I still see in code reviews.

Mistake 1: Misaligned columns

If you reorder columns in one SELECT but not the others, the query still runs and produces nonsense. I avoid this by using column lists and consistent aliases in every SELECT.

Bad:

SELECT userid, email, createdat FROM users

UNION ALL

SELECT email, userid, createdat FROM users_archive;

Better:

SELECT userid, email, createdat FROM users

UNION ALL

SELECT userid, email, createdat FROM users_archive;

If you must transform columns, do it explicitly in every SELECT.

Mistake 2: Type mismatches that coerce data

If one SELECT has an integer and the other has a string in the same column position, the database may coerce data, often to string. That can break sorting, filtering, and numeric aggregation. Fix it with explicit casts:

SELECT CAST(totalcents AS BIGINT) AS totalcents, order_id

FROM pos_orders

UNION ALL

SELECT CAST(totalcents AS BIGINT) AS totalcents, order_id

FROM mobile_orders;

I also recommend testing with a small sample to verify the output types.

Mistake 3: Hidden duplicates from JOINs

Sometimes duplicates originate not from the UNION ALL itself but from JOINs inside each SELECT. If you’re merging data from multiple sources and see unexpected row explosion, I check the JOINs first.

I often use this quick diagnostic to find duplicate sources:

SELECT source, COUNT(*) AS rows

FROM (

SELECT ‘pos‘ AS source, orderid FROM posorders

UNION ALL

SELECT ‘mobile‘ AS source, orderid FROM mobileorders

) x

GROUP BY source;

This tells me which side is producing too many rows.

Mistake 4: ORDER BY in the wrong place

You can’t ORDER BY inside each SELECT and expect the final output to be ordered. Sorting has to happen at the end. If you need order, do this:

SELECT *

FROM (

SELECT orderid, orderts, ‘pos‘ AS source FROM pos_orders

UNION ALL

SELECT orderid, orderts, ‘mobile‘ AS source FROM mobile_orders

) all_orders

ORDER BY order_ts DESC;

The inner ORDER BY is ignored in most engines unless paired with LIMIT or other constructs.

Mistake 5: Forgetting deterministic rules when de‑duplicating

If you combine UNION ALL with a later DISTINCT or GROUP BY, be careful: without a clear rule you can discard important details. For example, if you GROUP BY customerid but ignore updatedat, you might lose the freshest data.

A safer pattern is to rank first:

WITH combined AS (

SELECT customerid, email, updatedat, ‘crm‘ AS source

FROM crm_customers

UNION ALL

SELECT customerid, email, updatedat, ‘billing‘ AS source

FROM billing_customers

), ranked AS (

SELECT *,

ROW_NUMBER() OVER (

PARTITION BY customer_id

ORDER BY updated_at DESC

) AS rn

FROM combined

)

SELECT customerid, email, updatedat

FROM ranked

WHERE rn = 1;

This makes the choice visible and repeatable.

Performance: what I actually watch in production

UNION ALL is usually faster than UNION, but I don’t rely on that assumption. I look at three things:

1) Row volume: If you’re stacking tens of millions of rows, UNION ALL can be dramatically faster because there’s no distinct step.

2) Memory usage: UNION may require a large hash set or sort to remove duplicates, which can spill to disk.

3) Parallel execution: Many engines can execute each SELECT in parallel and then concatenate. UNION ALL benefits more from this pattern.

Here’s a concrete example using a realistic query design:

WITH combined AS (

SELECT orderid, customerid, ordertotal, orderts

FROM pos_orders

WHERE order_ts >= ‘2026-01-01‘

UNION ALL

SELECT orderid, customerid, ordertotal, orderts

FROM mobile_orders

WHERE order_ts >= ‘2026-01-01‘

)

SELECT customerid, SUM(ordertotal) AS total_spend

FROM combined

GROUP BY customer_id;

By filtering in each SELECT, you reduce the volume before stacking, which is crucial. In practice, I see improvements in the “typically 10–50 ms” range for mid‑sized reports compared to the same query using UNION. That range grows as data scales.

When UNION ALL can still be slow

UNION ALL doesn’t fix underlying bottlenecks. If each SELECT is slow, the union won’t save you. I focus on:

  • Indexes that support the WHERE clause and JOINs inside each SELECT
  • Column pruning: select only the columns you need
  • Partition pruning if you have partitioned tables

If you’re on a distributed warehouse, I also pay attention to data shuffling. UNION ALL can trigger a shuffle if the next step requires a global operation like GROUP BY.

When you should NOT use UNION ALL

I recommend UNION ALL most of the time, but there are cases where it’s the wrong tool.

1) You must ensure uniqueness in the output

If you need a unique list of customers across multiple sources, UNION is clearer. You could use DISTINCT on UNION ALL, but that’s more verbose and easier to miss in code review.

SELECT customer_id

FROM crm_customers

UNION

SELECT customer_id

FROM billing_customers;

2) You are modeling a canonical record

If you’re merging sources to create a single “golden record,” you should do explicit merging rules. UNION ALL will only stack rows, not reconcile conflicts. For canonical records, I recommend a staged approach:

  • UNION ALL the sources
  • Assign source priorities and timestamps
  • Select the most recent or highest‑priority row per entity

Here’s a practical version:

WITH combined AS (

SELECT customerid, email, updatedat, 2 AS priority

FROM marketing_customers

UNION ALL

SELECT customerid, email, updatedat, 1 AS priority

FROM billing_customers

), ranked AS (

SELECT *,

ROW_NUMBER() OVER (

PARTITION BY customer_id

ORDER BY priority ASC, updated_at DESC

) AS rn

FROM combined

)

SELECT customerid, email, updatedat

FROM ranked

WHERE rn = 1;

This produces a deterministic, explainable output, which is what you want for canonical data.

3) You’re dealing with sensitive duplication

In regulated domains (health, finance), duplicates can have legal meaning. It’s safer to treat duplicates explicitly and log them, rather than letting a UNION silently remove them. I default to UNION ALL plus explicit de‑duplication so the rule is visible.

Advanced patterns I actually use

Once you’re comfortable with UNION ALL, you can build more powerful constructs. These are the ones I find most useful in modern pipelines.

Pattern: “Delta + Snapshot” rollup

A common data warehouse design is a full snapshot table plus daily deltas. You can combine them using UNION ALL and then choose the latest record per key.

WITH unioned AS (

SELECT customerid, status, updatedat

FROM customer_snapshot

UNION ALL

SELECT customerid, status, updatedat

FROM customer_delta

WHERE updated_at >= ‘2026-01-01‘

), ranked AS (

SELECT *,

ROW_NUMBER() OVER (

PARTITION BY customer_id

ORDER BY updated_at DESC

) AS rn

FROM unioned

)

SELECT customerid, status, updatedat

FROM ranked

WHERE rn = 1;

This structure scales well and makes it easy to backfill or reprocess data.

Pattern: “Soft delete awareness” in unions

If one source can mark rows as deleted, make that explicit so your downstream query can resolve conflicts.

SELECT accountid, status, updatedat, false AS is_deleted

FROM active_accounts

UNION ALL

SELECT accountid, status, deletedat AS updatedat, true AS isdeleted

FROM deleted_accounts;

Then you can decide whether to filter deletes or keep them for audit views.

Pattern: Schema evolution with default values

If one source has a new column, you can align schemas with defaults:

SELECT orderid, customerid, totalcents, promocode

FROM orders_v2

UNION ALL

SELECT orderid, customerid, totalcents, NULL AS promocode

FROM orders_v1;

This keeps a stable output schema while you migrate.

Pattern: “Union then aggregate” for blended KPIs

A frequent need is to create a single metric across sources. UNION ALL is perfect here because it preserves volume before aggregation.

WITH events AS (

SELECT userid, revenuecents, event_ts

FROM web_purchases

UNION ALL

SELECT userid, revenuecents, event_ts

FROM mobile_purchases

UNION ALL

SELECT userid, revenuecents, event_ts

FROM pos_purchases

)

SELECT DATE(eventts) AS day, SUM(revenuecents) / 100.0 AS revenue

FROM events

GROUP BY DATE(event_ts)

ORDER BY day;

The key is to keep the event stream intact so you can compute accurate totals.

Pattern: “Segmented views” with multiple filters

Sometimes the same table should contribute rows to multiple logical segments. UNION ALL lets you tag each segment explicitly and keep the details.

SELECT userid, ‘trial‘ AS segment, createdat AS segment_ts

FROM users

WHERE plan = ‘trial‘

UNION ALL

SELECT userid, ‘paid‘ AS segment, upgradedat AS segment_ts

FROM users

WHERE plan = ‘paid‘

UNION ALL

SELECT userid, ‘churned‘ AS segment, churnedat AS segment_ts

FROM users

WHERE plan = ‘churned‘;

This is a clean way to build a segment timeline for analysis.

Edge cases and practical guidance

Here are a few scenarios that are easy to overlook.

Case: NULL handling in UNION ALL

NULL values do not cause de‑duplication because there is no de‑duplication. But if you later apply DISTINCT on the output, multiple NULLs in the same columns can be treated as duplicates. That can bite you if NULLs represent “unknown but distinct” values. If that distinction matters, add a surrogate key or a source tag so each row is uniquely identifiable.

A simple fix is to add a composite ID:

SELECT CONCAT(‘web:‘, eventid) AS eventkey, userid, eventts

FROM web_events

UNION ALL

SELECT CONCAT(‘mobile:‘, eventid) AS eventkey, userid, eventts

FROM mobile_events;

Now downstream DISTINCT is much safer.

Case: Mixed time zones

If one source stores timestamps in UTC and another stores local time, UNION ALL will happily stack them. The issue shows up later when you sort or aggregate by date. Normalize your time zones inside each SELECT.

SELECT orderid, orderts AT TIME ZONE ‘UTC‘ AS ordertsutc

FROM web_orders

UNION ALL

SELECT orderid, (orderts AT TIME ZONE ‘America/LosAngeles‘) AT TIME ZONE ‘UTC‘ AS orderts_utc

FROM store_orders;

I always normalize early; it avoids subtle reporting drift.

Case: Different levels of granularity

Sometimes one source is at the event level and another is aggregated. If you UNION ALL them, you’ll mix granular rows with summary rows, which leads to double counting. Either aggregate both to the same level or keep them in separate streams and blend later.

A safe approach is to aggregate before unioning:

SELECT userid, DATE(eventts) AS day, COUNT(*) AS events

FROM web_events

GROUP BY userid, DATE(eventts)

UNION ALL

SELECT userid, DATE(eventts) AS day, COUNT(*) AS events

FROM mobile_events

GROUP BY userid, DATE(eventts);

Now the granularity is aligned.

Case: Duplicate keys are intentional

Think about payments or inventory movement. The same order_id might appear twice if a customer split their payment. If you use UNION instead of UNION ALL, you can silently delete that second payment record. The safe policy is: if a key can repeat in the real world, preserve it unless you have an explicit rule that says otherwise.

Case: Different column order in views

Views evolve. If one view adds columns or changes order, a UNION ALL can break or, worse, silently misalign if you use SELECT *.

I never use SELECT * across UNION ALL boundaries. I always list columns in order and alias them.

Case: Non‑deterministic ordering

UNION ALL doesn’t guarantee order. If you rely on order without an explicit ORDER BY, your results may differ between runs. Always apply ORDER BY in the outer query when order matters.

Alternative approaches: when UNION ALL isn’t the only choice

UNION ALL is the right tool in many cases, but it’s not the only way to combine datasets. Sometimes a different approach is cleaner or safer.

1) Use JOINs when you need horizontal alignment

If your goal is to enrich each row with additional attributes rather than stack rows, a JOIN is correct.

SELECT o.orderid, o.orderts, c.customer_tier

FROM orders o

JOIN customers c ON c.customerid = o.customerid;

If you used UNION ALL here, you’d create two separate rows instead of one enriched row.

2) Use MERGE/UPSERT for persistent canonical tables

When you want to combine sources into a table that represents the latest state, a MERGE or UPSERT is typically more appropriate than repeatedly unioning the same sources at query time.

MERGE INTO customer_master t

USING new_customers s

ON t.customerid = s.customerid

WHEN MATCHED THEN UPDATE SET email = s.email, updatedat = s.updatedat

WHEN NOT MATCHED THEN INSERT (customerid, email, updatedat) VALUES (s.customerid, s.email, s.updatedat);

I still use UNION ALL to stage the data before a MERGE, but the final step is persistent and deterministic.

3) Use UNION ALL + EXCEPT for overlap audits

If you’re trying to understand overlap rather than just combine data, an EXCEPT or INTERSECT can reveal duplicates across sources. I often do this before deciding whether to use UNION or UNION ALL.

SELECT orderid FROM posorders

INTERSECT

SELECT orderid FROM mobileorders;

This tells you how much overlap to expect. Then you can decide how to handle it.

4) Use window functions for controlled merging

When you need a “best row” rule, UNION ALL plus ROW_NUMBER is powerful, but sometimes a window function on a single table is enough. Always match the tool to the problem.

Deep dive: a practical reconciliation workflow

Here’s a more complete workflow I use for reconciliation tasks, where duplicates are expected and must be audited, not removed.

Step 1: Normalize columns and add a source key

WITH normalized AS (

SELECT

order_id,

customer_id,

total_cents,

order_ts,

‘pos‘ AS source

FROM pos_orders

WHERE order_ts >= ‘2026-01-01‘

UNION ALL

SELECT

order_id,

customer_id,

total_cents,

order_ts,

‘mobile‘ AS source

FROM mobile_orders

WHERE order_ts >= ‘2026-01-01‘

UNION ALL

SELECT

order_id,

customer_id,

total_cents,

order_ts,

‘web‘ AS source

FROM web_orders

WHERE order_ts >= ‘2026-01-01‘

)

SELECT * FROM normalized;

Step 2: Count duplicates by key across sources

WITH normalized AS (

SELECT orderid, customerid, totalcents, orderts, ‘pos‘ AS source FROM pos_orders

UNION ALL

SELECT orderid, customerid, totalcents, orderts, ‘mobile‘ AS source FROM mobile_orders

UNION ALL

SELECT orderid, customerid, totalcents, orderts, ‘web‘ AS source FROM web_orders

)

SELECT order_id, COUNT(*) AS occurrences

FROM normalized

GROUP BY order_id

HAVING COUNT(*) > 1

ORDER BY occurrences DESC;

This tells you which orders appear across systems. Sometimes that’s a bug; sometimes it’s a valid multi‑channel flow. The difference matters.

Step 3: Build a conflict‑aware rollup

WITH normalized AS (

SELECT orderid, customerid, totalcents, orderts, ‘pos‘ AS source FROM pos_orders

UNION ALL

SELECT orderid, customerid, totalcents, orderts, ‘mobile‘ AS source FROM mobile_orders

UNION ALL

SELECT orderid, customerid, totalcents, orderts, ‘web‘ AS source FROM web_orders

), ranked AS (

SELECT *,

ROW_NUMBER() OVER (

PARTITION BY order_id

ORDER BY order_ts DESC

) AS rn

FROM normalized

)

SELECT orderid, customerid, totalcents, orderts, source

FROM ranked

WHERE rn = 1;

Here I’m explicitly choosing the most recent order record. That’s a business rule, not a database default.

Production guardrails I add around UNION ALL

When UNION ALL is part of a critical data pipeline, I wrap it with guardrails that catch drift early. These are simple but extremely effective.

1) Row count checks per source

I log row counts from each source and set alerts on large deviations.

SELECT source, COUNT(*) AS row_count

FROM (

SELECT ‘pos‘ AS source, orderid FROM posorders

UNION ALL

SELECT ‘mobile‘ AS source, orderid FROM mobileorders

UNION ALL

SELECT ‘web‘ AS source, orderid FROM weborders

) s

GROUP BY source

ORDER BY row_count DESC;

If a source drops to zero, I know before downstream reports break.

2) Column type assertions

In systems that support it, I use explicit casts in every SELECT. It feels verbose but makes the output schema deterministic.

SELECT

CAST(orderid AS BIGINT) AS orderid,

CAST(customerid AS BIGINT) AS customerid,

CAST(totalcents AS BIGINT) AS totalcents,

CAST(orderts AS TIMESTAMP) AS orderts

FROM pos_orders

UNION ALL

SELECT

CAST(orderid AS BIGINT) AS orderid,

CAST(customerid AS BIGINT) AS customerid,

CAST(totalcents AS BIGINT) AS totalcents,

CAST(orderts AS TIMESTAMP) AS orderts

FROM mobile_orders;

Yes, it’s repetitive. It’s also unambiguous.

3) Explicit source tagging

I never merge sources without a source column. It keeps downstream logic readable and makes troubleshooting easy.

4) “Smoke test” query

Before I push a change, I run a tiny sample query with LIMIT to validate output shape and ordering.

SELECT *

FROM (

SELECT orderid, orderts, ‘pos‘ AS source FROM pos_orders

UNION ALL

SELECT orderid, orderts, ‘mobile‘ AS source FROM mobile_orders

) x

ORDER BY order_ts DESC

LIMIT 50;

This catches misaligned columns and odd timestamps early.

Performance tuning beyond UNION ALL

If your UNION ALL queries are still slow, the bottleneck is usually elsewhere. Here are the levers that make the biggest difference:

1) Filter early, not late

Push WHERE clauses into each SELECT rather than filtering after the union. That reduces the amount of data that flows into the union.

2) Avoid SELECT *

Selecting extra columns slows IO and can force unnecessary data movement. List only the columns you need.

3) Use incremental sources

If a source provides change‑data capture or incremental feeds, use them. UNION ALL works beautifully with deltas, and it keeps the result set smaller.

4) Keep ordering out of the union

Don’t ORDER BY inside each SELECT. Let the final query order the combined result.

5) Watch out for implicit casts

If one source has a VARCHAR and another has INT, the engine may cast to VARCHAR. That makes numeric aggregation slower. Cast explicitly to a numeric type when you can.

Traditional vs modern workflows (and why UNION ALL still matters)

Even with modern tooling, UNION ALL is still a foundational building block. The difference is how you structure and validate it.

Aspect

Traditional SQL

Modern SQL workflow —

— Query structure

One big query

Layered models with reusable CTEs or views Validation

Manual spot checks

Automated row counts, schema tests, data tests Documentation

Minimal

Inline comments, model descriptions, lineage graphs Performance checks

Post‑hoc

Query planning and cost‑based analysis during development

No matter how modern the workflow, UNION ALL remains the fastest, simplest way to stack sets.

SQL UNION ALL in a 2026 workflow

Modern teams often use SQL through data build tools, orchestration frameworks, and AI‑assisted query editors. That doesn’t change how UNION ALL works, but it changes how you structure and validate it.

Here’s what I recommend in a 2026‑style workflow:

  • Reusable models: Create a single “combined” model using UNION ALL, then reuse it in downstream models. This reduces drift and makes it easy to audit.
  • Query tests: Add tests to check for unexpected duplicates and row counts. A simple count per source can catch issues early.
  • AI‑assisted review: I often ask AI tools to check for column alignment and type mismatches. It’s a great second pair of eyes, but I still validate with sample queries.
  • Observability: Record row counts per source each run. If a source suddenly drops to zero, UNION ALL won’t tell you—it will happily return fewer rows.

A sample validation query I use in pipelines:

SELECT source, COUNT(*) AS row_count

FROM (

SELECT ‘web‘ AS source, eventid FROM webevents

UNION ALL

SELECT ‘mobile‘ AS source, eventid FROM mobileevents

UNION ALL

SELECT ‘pos‘ AS source, eventid FROM posevents

) s

GROUP BY source

ORDER BY row_count DESC;

This is cheap insurance against silent failures.

Frequently asked questions I get about UNION ALL

These are the questions that come up in reviews and training sessions, and the answers I give every time.

“Does UNION ALL preserve order?”

No. It preserves the order of each SELECT’s output as produced by the engine, but that order is not guaranteed unless you apply an outer ORDER BY. If ordering matters, always order in the outer query.

“Is UNION ALL always faster?”

Usually, but not always. If your query is dominated by other expensive operations (complex JOINs, heavy aggregation, remote scans), the savings from skipping de‑duplication might be small. Still, UNION ALL avoids unnecessary work and is the right default when you want all rows.

“Can I UNION ALL different column names?”

Yes, as long as the positions and types align. The output column names typically come from the first SELECT, so make sure the first SELECT has clean aliases.

“What if I need both deduped and non‑deduped views?”

Build a base view with UNION ALL, then layer a SELECT DISTINCT or ROW_NUMBER on top. This keeps the raw data available while giving you a clean view for downstream reports.

Practical checklist before shipping a UNION ALL query

This is the exact checklist I run through before I deploy or hand off a UNION ALL query.

1) Column alignment: Are the columns in the same order and type in every SELECT?

2) Explicit casts: Are the data types consistent and explicit?

3) Source tagging: Is there a source column for debugging?

4) Early filters: Are WHERE clauses pushed down into each SELECT?

5) Order control: Is any needed ordering applied at the end?

6) Duplication policy: Is the handling of duplicates intentional and visible?

If I can answer “yes” to all six, I ship it.

A short mental model that never fails me

When I’m unsure, I ask myself two questions:

1) Are duplicates meaningful in the real world?

2) Is it safe to remove them without a clear rule?

If the answer to either is “yes” to the first or “no” to the second, I choose UNION ALL. Then I apply explicit rules only if I truly need to deduplicate. That mindset avoids accidental data loss and keeps my queries honest.

Closing thoughts

UNION ALL is deceptively simple, and that’s exactly why it’s powerful. It gives you control: control over duplicates, control over performance, and control over how you reconcile real‑world data that rarely lines up neatly across systems. Most of the painful bugs I see with UNION vs UNION ALL aren’t about syntax—they’re about intent. Developers assume duplicates are bad, or they forget that UNION removes them silently. Once you internalize that difference, your SQL becomes clearer and your results become more trustworthy.

If you remember one thing, make it this: use UNION ALL when you want every row, then apply your own rules when you need to reduce. It’s faster, safer, and more honest about the data you’re working with. That’s the difference between a report that looks clean and a report that’s actually correct.

Scroll to Top