The AVG function is one of the most useful aggregate functions in Redshift SQL. It allows you to easily calculate the arithmetic mean or average of a set of numeric values in a column.

In this comprehensive guide, we will explore when and how to use the Redshift AVG function with some practical examples.

When Should You Use the AVG Function?

Here are some common cases where the AVG aggregate function comes in handy:

  • Find the average order value or ticket size: Calculate the average order amount for ecommerce data to understand customer spending patterns.

  • Track average metric values over time: View trends in average pageviews, conversion rates, etc. per day/week/month.

  • Average numerical grades: Get the class average test scores or assignment grades for analysis.

  • Benchmark averages: Compare category averages like average product prices, salaries, ages, etc. against benchmarks.

  • Gain overall central tendency: Understand the center point of a batch of measurements like sensor readings.

Basically, any time you need a quick snapshot of the "typical" or "middle" value in a set of numbers, the AVG function can help. The arithmetic mean helps provide a balanced aggregate view.

SQL Syntax of the AVG Function

The basic syntax for using the AVG function is:

SELECT AVG(column_or_expression) 
FROM your_table
[WHERE conditions];

Let‘s break this down:

  • AVG() is the aggregate function name
  • Inside the parenthesis is the numeric column or expression you want the average for
  • Optional WHERE clause to filter rows first
  • Can add GROUP BY, HAVING etc. to break into groups

Some key points about syntax:

  • Works on integer, decimal, float etc. numeric columns
  • NULL values are ignored from calculation
  • Can use DISTINCT inside AVG() to eliminate duplicates

Now let‘s see some examples of using AVG() function on Redshift tables.

Example 1: Get Average Order Value

Let‘s say we have an ecommerce orders table with columns for order_id, customer_id, order_date and order_total (cost of order).

We want to find the overall average order value across all orders.

SELECT AVG(order_total) AS average_order
FROM orders;

This calculates the arithmetic mean spending per order. The AS clause renames it to average_order.

Say the average order value is $49.82. This helps us understand customer spending behavior.

We can also get the average order value per customer to segment consumers:

SELECT customer_id, AVG(order_total) AS avg_order
FROM orders
GROUP BY customer_id;

Analyzing the per-customer average order value distribution allows for advanced segmentation and targeting.

Example 2: Track Average Daily Pageviews

Let‘s say we collect our website traffic data into a pageviews table with columns date, page_url, and visitor_id.

We want to analyze the trend in average pageviews per day over time.

SELECT date, 
       AVG(pageviews) AS daily_avg_pageviews
FROM (
    SELECT date, COUNT(DISTINCT visitor_id) AS pageviews
    FROM pageviews
    GROUP BY date
) AS daily_user_table
GROUP BY date
ORDER BY date;

Here we first used a subquery to calculate distinct pageviews per date. Next, we apply AVG() to find the daily mean pageviews overall. The outer query groups it back by date and orders chronologically.

Charting this daily average data allows us to track site engagement over weeks/months and correlate with other events.

Example 3: Average Test Scores Per Student

Say we have a test_results table containing student_id, test_date, test_score etc. columns.

We want to view the overall average test score per student across all dates.

SELECT student_id, AVG(test_score) AS avg_score
FROM test_results
GROUP BY student_id;

This helps analyze performance by student. Maybe we even spot students needing attention based on low averages.

Optional HAVING clause can filter by threshold:

HAVING AVG(test_score) < 70;

Gets the underperforming students for further analysis.

As you can see, the AVG function helps summarize sets of numeric data at a glance!

When to Use DISTINCT Inside AVG()?

The AVG function lets you add an optional DISTINCT keyword inside like:

AVG(DISTINCT column)

This eliminates any duplicate values before average calculation.

Use DISTINCT AVG when you want the mean across all the UNIQUE values only. It helps reduce outlier bias.

For example, we only want each customer‘s latest salary to analyze average salary range:

SELECT AVG(DISTINCT salary) 
FROM employee_salaries
WHERE YEAR = 2022;

But DISTINCT can slow down aggregation for large data. So use judiciously based on needs.

Handling NULLs and Zero Values

The AVG function omits NULL values found in the set. The arithmetic mean is only calculated from non-NULL numbers.

For example:

SELECT AVG(sales)
FROM regional_data
WHERE region = ‘South‘;

If the South region subtable contains some NULLs in the sales column, they are excluded from AVG().

Similarly, rows with 0 values are included by default. Use CASE expressions to handle zero values that need excluding from the AVG calculation.

Conclusion

Calculating averages is a common requirement during data analysis. With SQL‘s intuitive AVG function, you can easily aggregate to find arithmetic means on Redshift dataset columns.

Key takeaways from this guide:

  • Use AVG() when you need a quick central tendency snapshot
  • Works on numeric columns like order values, pageviews etc.
  • Omits NULLs from calculation automatically
  • DISTINCT inside AVG() eliminates duplicate bias
  • Pairs well with GROUP BY, HAVING etc for segmented analysis
  • Helps identify trends and data outliers

So next time you need some insightful averages from your reporting data, be sure to leverage the Redshift SQL AVG function!

Similar Posts