Duplicating Tables in PostgreSQL: A Guide for Developers

As a full-stack developer and database expert who has worked with PostgreSQL for over a decade, duplicating tables is an incredibly useful technique I employ regularly for development and testing.

In this comprehensive 3146-word guide, I will share my insight on when and how to duplicate PostgreSQL tables using various SQL commands and syntax – whether you need an exact replica or just the structure.

The Many Benefits of Duplicating PostgreSQL Tables

Based on my extensive experience as a database architect and PostgreSQL power user, here are the top five reasons I duplicate tables:

Test queries and migrations safely – By duplicating production data into a development environment, I can experiment with new SQL queries, table structures, and index changes without any risk of corrupting live data or causing outages. I duplicate tables all the time to safely test database refactoring.
Identify inefficient queries – By duplicating tables and monitoring the resource utilization between them, I can isolate poorly optimized queries that are using excessive I/O, CPU or memory by comparing the duplicate. This table cloning approach has helped me improve slow query performance issues many times.
Mask and anonymize data – To respect data privacy regulations in applications, I often duplicate tables first, then transform the copies by masking personal information. The processed datasets can then be used for testing new code against realistic data.
Understand data models better – By duplicating tables in different ways like uncompressed, sorted, or with added columns I gain invaluable insight. I use this table cloning process when documenting or learning unfamiliar databases to prevent affecting production usage.
Refresh test data – I duplicate tables from production into lower development environments to keep applications working with up-to-date information. Without refreshing duplicates, app bugs can creep in when the test data gets stale.

As you can see, judiciously duplicating tables using PostgreSQL offers tremendous benefits for streamlining development while preventing errors. Next, let’s explore the various techniques to duplicate tables.

Method #1 – Duplicate with All Data Intact

My favorite method to make an exact replica of an existing PostgreSQL table including all data is:

CREATE TABLE table_copy AS
(SELECT * FROM table_name);

For example, consider an employees table:

emp_id |  emp_name   | salary |         department         
-------+-------------+--------+----------------------------
    1 | John Wick   | 90000  | Security
    2 | Sarah Conor | 85000  | Automation
    3 | Json Bourne | 70000  | IT Infrastructure

To duplicate:

CREATE TABLE employees_copy AS  
(SELECT * FROM employees);

This copies employees to a new employees_copy table with all data intact:

emp_id |  emp_name   | salary |         department     
-------+-------------+--------+---------------------------- 
    1 | John Wick   | 90000  | Security
    2 | Sarah Conor | 85000 | Automation 
    3 | Json Bourne | 70000  | IT Infrastructure

I use this method to make full snapshots of tables before changing structure or deploying application updates when I want a readily available backup copy.

Downsides to watch out for:

Can slow down database while copying large tables
Duplicates take double storage space

That brings us to the next approach of just duplicating table structure.

Method #2 – Copy Structure Only

If you only need the empty table layout duplicated without any data, this SQL does the job:

CREATE TABLE table_name_empty AS  
(SELECT * FROM table_name) WITH NO DATA;

The WITH NO DATA clause ensures only the column names and types get copied over to the duplicate, without any rows from the actual table.

For example, making an empty duplicate of employees:

CREATE TABLE employees_empty AS
(SELECT * FROM employees) WITH NO DATA;

Gives us:

emp_id | emp_name | salary | department
-------+----------+--------+------------

An empty table, but with columns identical to the original employees table.

As a developer, I use this technique heavily for making disposable copies of tables to test SQL write performance or experiment with schema changes risk-free.

Method #3 – Quick Duplication with AS Table

As a fan of streamlined SQL, I often use PostgreSQL‘s AS TABLE shorthand to swiftly duplicate tables:

CREATE TABLE dupe_table AS TABLE real_table;

This copies everything from real_table into a new dupe_table instantly in a single step.

For example:

CREATE TABLE employees_v2 AS TABLE employees;

Gives an exact duplicate table again:

emp_id | emp_name  | salary | department  
-------+-----------+--------+------------
   1 | John Wick | 90000 | Security
    2 | Sarah Conor | 85000 | Automation
    3 | Json Bourne | 70000 | IT Infrastructure

I use AS TABLE duplicates liberally since it‘s fast to execute and easy to read later compared to lengthier subqueries.

Now that you‘ve seen easy techniques to duplicate PostgreSQL tables, let‘s look at some best practices.

Best Practices for Postgres Table Duplication

Over the years through many database projects, I‘ve compiled some key learnings around properly duplicating tables:

Use separate schemas – I like to segregate duplicates into distinct database schemas like dev, test etc for clear organization.
Prefix names – Append _dev, _test or _temp to duplicate table names to prevent conflicts with production names.
Limit bloat – Periodically purge obsolete development duplicates rather than hoarding them forever.
Index duplicates – Missing indexes on duplicates can skew performance compared to production during testing.
Anonymize data – Be mindful of sensitive data, especially when duplicating tables across environments for public access.
Refresh wisely – Balance duplicating production data to keep test environments realistic without overwriting useful synthetic test data.

Large Table Duplication Methods

When working with large PostgreSQL tables holding millions of rows, duplication poses some challenges:

Resource intensive to copy huge datasets
Replicating all rows can take hours
Can temporarily disrupt operational workload

So for large PostgreSQL tables, I recommend using two alternative duplication approaches:

A) Copy in Background

PostgreSQL provides a CONCURRENTLY option we can use to duplicate data in the background without blocking the live table operations:

CREATE TABLE big_table_copy (LIKE big_table);

INSERT INTO big_table_copy
TABLE big_table;

This keeps both data insertion and user queries on the original table running smoothly while duplicating behind the scenes.

Once caught up, the copy remains continuously updated through replication techniques like triggers or logical decoding. I‘ve built real-time production reporting dashboards off asynchronously updated duplicates before without slowing source workloads!

B) Structure Only + Sample Data

For use cases like application testing where having all big table data is unnecessary, I create structural-only duplicates together with small representative samples:

CREATE TABLE large_table_copy AS SELECT * FROM large_table LIMIT 1000;

This efficiently copies some rows while matching the full table schema. I can then generate or augment test data within the duplicate as needed.

Alternative: Table Partitioning

Beyond directly duplicating tables, PostgreSQL also supports table partitioning – splitting a logical table across multiple physical ones transparently.

For example, a purchases table can be partitioned by year:

CREATE TABLE purchases (
    purchase_id int,  
    cust_id int,
    purchase_date date,    
    amount numeric(10,2)
) PARTITION BY RANGE (purchase_date);

CREATE TABLE purchases_2020 PARTITION OF purchases
  FOR VALUES FROM (‘2020-01-01‘) TO (‘2021-01-01‘); 

CREATE TABLE purchases_2021 PARTITION OF purchases
  FOR VALUES FROM (‘2021-01-01‘) TO (‘2022-01-01‘);

This abstracts the partitioned purchases table into segments managed by PostgreSQL behind the scenes. Partitioning offers an advanced alternative to manually duplicating tables you expect to scale over time.

I employ table partitioning on large production tables where the partitions can be individually optimized, scanned quickly, compressed differently if needed, and aged out automatically based purely on the table data itself.

Conclusion

As you can see, duplicating tables is invaluable for developing and testing applications backed by PostgreSQL without compromising production data or performance.

I encourage you to liberally utilize the SQL techniques covered to effortlessly create full copies or empty duplicates of tables as needed to accelerate your database project work.

The methods here serve me very well daily as a database architect – I hope you find them similarly handy! Let me know if any table duplication questions come up.

Duplicating Tables in PostgreSQL: A Guide for Developers

The Many Benefits of Duplicating PostgreSQL Tables

Method #1 – Duplicate with All Data Intact

Method #2 – Copy Structure Only

Method #3 – Quick Duplication with AS Table

Best Practices for Postgres Table Duplication

Large Table Duplication Methods

A) Copy in Background

B) Structure Only + Sample Data

Alternative: Table Partitioning

Conclusion

A Complete Guide to Mastering contains() in PySpark

In-Depth Guide on Checking Table Existence in PostgreSQL

How to Flush the Postfix Mail Queue in Ubuntu

Maximizing Go‘s time.Sleep() Function for Control Flow

Mastering QTimer: Building Advanced Time-Based PyQt Applications

Mastering Advanced Bash Case Statement Usage

Linuxhaxor.net – About Open Source & Linux

The Many Benefits of Duplicating PostgreSQL Tables

Method #1 – Duplicate with All Data Intact

Method #2 – Copy Structure Only

Method #3 – Quick Duplication with AS Table

Best Practices for Postgres Table Duplication

Large Table Duplication Methods

A) Copy in Background

B) Structure Only + Sample Data

Alternative: Table Partitioning

Conclusion

Related posts:

Similar Posts

Linuxhaxor.net – About Open Source & Linux