As a full-stack developer, having deep visibility into your database schema is invaluable. Knowing your table structures, columns, encodings and sort keys allows properly loading data, optimizing queries and keeping systems humming.

Amazon Redshift’s SHOW TABLE command unveils this critical table metadata to enable better database development.

In this comprehensive guide, you’ll learn how to leverage SHOW TABLE to boost productivity including:

  • Columnar Storage Fundamentals
  • SHOW TABLE Syntax
  • Inspecting Tables
  • Contrasting SHOW TABLE and Information Schema
  • Recreating Tables
  • Optimization Best Practices
  • SHOW TABLE for ETL Developers

Let’s examine why understanding SHOW TABLE is crucial for any Redshift developer or analyst.

A Columnar Storage Primer

Before diving into SHOW TABLE, it’s worth exploring Redshift‘s columnar data storage and how it impacts tables and design.

Unlike traditional row-based databases, Redshift organizes data by column. A simplified view of a customers table illustrates this:

Row-based Table vs Columnar Table

Row-based Table
+----+--------+--------+-------+ 
| id | name   | city   | order |
+----+--------+--------+-------+
| 1  | John   | Boston | 257   |
| 2  | Sarah  | Miami  | 148   | 
| 3  | Steve  | Austin | 624   |
+----+--------+--------+-------+

Columnar Table 
+----+--------+-------+ 
| 1  | John   | 257   |
| 2  | Sarah  | 148   |
| 3  | Steve  | 624   |  
+----+--------+-------+
|Boston| Miami | Austin|
+----+--------+-------+

In the columnar structure, all data for a column is stored together instead of being split across rows. This allows extremely fast column scans and analysis vs row-by-row processing.

For developers, several implications emerge:

  • Query only columns you need – Extracting fewer columns speeds queries and reduces I/O.
  • Order columns by usage – Structure related columns that are queried together next to each other.
  • Optimize encodings – Balance compression and performance based on access patterns.

This is why tools like SHOW TABLE that provide column visibility are critical.

Now let’s see how to use SHOW TABLE to inspect tables.

SHOW TABLE Syntax and Usage Refresher

SHOW TABLE reveals table definition details including:

  • Column names / data types
  • Encodings
  • Sort keys
  • Constraints
  • Storage information
  • CREATE TABLE statement

Making it easy to analyze and recreate tables.

Syntax:

SHOW TABLE [schema_name.]table_name; 

For example:

SHOW TABLE tickit.venue;

Let‘s inspect some table metadata to see SHOW TABLE in action.

Inspecting Tables with SHOW TABLE

Being able to instantly view table schemas helps with:

  • Data analysis and loading
  • Query optimization
  • ETL processing
  • Recreating tables
  • And much more

Let‘s go through some examples.

Examining Table Columns and Encoding

Consider a users table that tracks website members:

SHOW TABLE users;

Returns:

                    Table "public.users"
     Column     |            Type             | Nullable | Default | Encoding  
----------------+-----------------------------+----------+---------+-----------
 id             | integer                     | not null |         | raw
 name           | character varying(50)       | not null |         | lzo
 city           | character varying(100)      |          |         | lzo
 signup_date    | date                        | not null |         | zstd
 last_login     | timestamp without time zone |          |         | zstd

This provides valuable insight into the columns, data types, nullable constraints and compression encoding to guide workload optimization.

For example, seeing last_login uses zstd encoding which trades CPU for higher compression while name uses lzo offering faster encoding. This aligns storage with query priorities.

Analyzing Table Constraints and Sort Keys

Looking at another table shows additional metadata:

SHOW TABLE sales; 

Returns:

                Table "public.sales"
 Column |     Type      | Nullable | Default | Encoding | Distkey | Sortkey | Stats_off
---------+---------------+----------+---------+----------+---------+---------+-----------
 id      | integer       | not null |         | lzo          | false | 1          |
 date    | date          | not null |         | zstd     | false | 0       |
 amount  | decimal(8,2)  |          |         | mostly32 | false | 0       |
 product | varchar(100)  |          |         | lzo      | false | 0       |
 region  | varchar(20)   |          |         | lzo      | false | 0       |

Additional metadata around NULL constraints, distribution keys, designated sort keys and whether statistics are enabled provide tremendous insight.

This level of detail helps appropriately model data, enhance query performance and manage workloads at scale.

Comparing SHOW TABLE and Information Schema

Developers familiar with PostgreSQL may be used to relying on the information schema views for metadata. So how does Redshift‘s SHOW TABLE compare with the info schema?

While information schema provides table and column details, key differences emerge in several areas:

  1. Performance – SHOW TABLE metadata queries execute 50x or more faster than similar info schema queries. Info schema must join many tables behind the scenes.

  2. Simplicity – SHOW TABLE returns the entire table definition in an easy to read format. Info schema splits details across multiple views requiring joins.

  3. Table recreation – SHOW TABLE contains the CREATE TABLE statement for actually recreating the table. Info schema does not.

  4. Redshift optimizations – SHOW TABLE includes sort keys and distribution styles tailored to Redshift. Info schema presents more generic metadata.

In practice, I rely on SHOW TABLE for rapid iteration and developers focused on warehouse performance. Info schema still plays an important role in providing standards-based metadata access. The two approaches complement each other.

Next let‘s look at recreating tables with SHOW TABLE.

Recreating Tables from Metadata

One of my favorite SHOW TABLE features is outputting the associated CREATE TABLE statement.

This allows effortlessly recreating tables without needing to manually document and capture all the various properties.

Let‘s examine a typical example:

SHOW TABLE event_sales;

Returns table definition along with CREATE TABLE:

                Table "public.event_sales"
Column  | Type | Nullable | (...) 
--------+-----+----------+-------
       .
       .

View definition:

  CREATE TABLE public.event_sales(
    event_id INTEGER ENCODE lzo,
    event_date DATE ENCODE zstd, 
    category VARCHAR(10) ENCODE lzo,
    tickets_sold DECIMAL(8,2) ENCODE mostly32,
    sales DECIMAL(10,2) ENCODE mostly32 DISTKEY SORTKEY
  )
  DISTSTYLE ALL;

Using this CREATE TABLE statement, we can now easily recreate event_sales to test new distribution styles, reorder columns, modify encodings and more. Extremely helpful when optimizing tables.

Recreating external tables is also simplified. For more on this see the AWS SHOW TABLE documentation.

Now that we‘ve covered fundamentals, let‘s move on to optimization best practices.

SHOW TABLE Optimization Best Practices

Once developers understand SHOW TABLE, next comes using it to increase development velocity and operational performance.

Here are 7 best practices I‘ve refined when working with production Redshift clusters under heavy load:

1. Set Alarms for Skew – Use SHOW TABLE stats to trigger alarms on hot keys from bad distributions. Catch issues before queries slow.

2. Analyze Stats Regularly – Review stats_off data over time as a gauge of query accuracy and performance. Investigate tables/queries triggering large outliers.

3. Simulate Workloads – When recreating tables with SHOW TABLE output, simulate production load levels using Redshift benchmarks for robust testing. Don‘t assume light testing is sufficient.

4. Review Encodoings Often – Check for suboptimal data compression as data characteristics change over time. Tune encoding types to balance performance vs savings.

5. Eliminate Hidden Sort Keys – Identify sort keys not being leveraged by queries and remove them. Extra undisclosed sorts burn resources needlessly.

6. Understand ETL Impacts – Use SHOW TABLE history to analyze how schema changes from ETL impact query times and data volumes. Optimize flows based on evidence.

7. Share Metadata Visibility – Extend SHOW TABLE visibility into data catalogs so engineers and analysts have self-service access to critical metadata.

Whether optimizing development or ETL pipelines, keep these tips top of mind to enhance workflows.

Next let‘s analyze how SHOW TABLE specifically boosts ETL jobs.

SHOW TABLE for ETL Developers

For developers authoring ETL jobs, having instant access to upstream and downstream table definitions via SHOW TABLE is invaluable.

Here are some of the ways I leverage SHOW TABLE in ETL processes:

Validate Code Assumptions – Compare source and target schemas to validate assumptions in transforms. Catch errors early.

Identify Needed Changes – When modifying downstream schemas, use SHOW TABLE to rapidly update ETL logic to handle added columns, new data types etc.

Prevent Schema Skew – Diff source and target table definitions to detect schema drift over time. Continuously align structures.

Annotate Metadata – Embed SHOW TABLE outputs within documentation to freeze schema details for future debugging.

Simulate Table Changes – Use SHOW TABLE to recreate tables with proposed changes to safely test ETL logic performance impacts.

Audit History – Leverage SHOW TABLE outputs when tracking the history of schema changes over time in case of unexpected impacts.

Evaluating the upstream and downstream schemas is a standard task in any ETL workflow. SHOW TABLE makes this metadata readily available so developers can focus on data flow logic versus chasing down schemas.

For even more ways to tune ETL with metadata, see the Orchestrate Data Journey with AWS Glue AWS Glue doc.

Now let‘s conclude with wrapping up key insights.

Conclusion and Next Steps

As we‘ve explored, the SHOW TABLE command offers tremendous visibility into the critical table metadata needed for Redshift development including:

  • Columns, data types, constraints
  • Encodings and compression
  • Distribution styles and sort keys
  • Associated CREATE TABLE statements

Tapping into this metadata through SHOW TABLE enables:

Rapid Iteration – Recreate tables effortlessly during optimization cycles

Enhanced Performance – Detect and resolve skew, imbalance and design anti-patterns

Robust ETL – Continuously align data schemas as flows evolve

Deeper Analysis – Combine table metadata with execution plans, locks, storage utilization metrics and more for holistic warehousing observability

If you found this guide helpful, check out these recommended next steps:

Adopting Amazon Redshift best practices around table design, ETL optimization and query improvements will make you more productive and enhance your cluster’s performance at scale.

Now put SHOW TABLE to work to unlock the full power of columnar storage and start taking facts to the next level!

Similar Posts