Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: October 27, 2024
In SQL, subqueries allow us to execute complex data operations within a single query. One useful feature is the ability to access the outer query table within a subquery. This technique, often referred to as correlated subqueries, allows us to write more efficient queries across different database systems.
In this tutorial, we’ll explore accessing the outer query table within subqueries using our simple University schema. Furthermore, we’ll ensure that our examples work seamlessly across different database management systems.
First, let’s explore what an outer reference is. An outer reference occurs when a subquery refers to a column or a table from the outer or main query. This allows the subquery to access and use data from the outer query.
There are several compelling reasons why outer query referencing can be useful:
Let’s take a look at an example to illustrate the concept:
SELECT s.name, s.gpa
FROM Student s
WHERE s.gpa > (
SELECT AVG(s2.gpa)
FROM Student s2
WHERE s2.enrollment_date = s.enrollment_date
);
+-----------------+------+
| name | gpa |
+-----------------+------+
| John Liu | 4 |
| Rita Ora | 4.2 |
| Samantha Prabhu | 4.9 |
| Philip Mohan | 5 |
| Sam Roberts | 4.3 |
| Roni Roto | 4.44 |
| Reena Roy | 4.15 |
| Agatha Christi | 4.59 |
| Pollards Grey | 4.65 |
+-----------------+------+
In the query above, we filtered for students with a GPA higher than the average GPA of students who enrolled on the same date. The subquery references the s.department_id from the outer query, creating a correlation between the two query levels.
Let’s explore some common use cases for outer query references in subqueries.
The WHERE clause is one of the most common places to use outer query references:
SELECT c.id, c.name
FROM Course c
WHERE EXISTS (
SELECT 1
FROM Registration r
WHERE r.course_id = c.id
AND r.semester = 'SPRING'
);
+-------+---------------------------------------------+
| id | name |
+-------+---------------------------------------------+
| CS111 | Introduction to Operating Systems |
| CS121 | Introduction to Databases |
| CS122 | Relational Databases |
| CS123 | Introduction to Structured Query Language |
| MA111 | Linear Algebra |
| CS531 | Advanced Data Structures |
| CS532 | Advanced Algorithms |
| CE111 | Introduction to Structural Engineering |
| MA241 | Convex Optimizaton |
| MA421 | Discrete Maths-III |
| MA441 | Advanced Optimizaton |
+-------+---------------------------------------------+
From the query above, we filtered for all courses that have at least one registration in the Spring semester. Notably, the subquery references the outer query’s c.id to check for the existence of registrations.
Subqueries in the SELECT clause, also known as scalar subqueries, can effectively use outer references:
SELECT s.name,
s.gpa,
(SELECT COUNT(*)
FROM Registration r
WHERE r.student_id = s.id) AS total_registrations
FROM Student s
WHERE s.gpa >= 4.0;
+-----------------+------+---------------------+
| name | gpa | total_registrations |
+-----------------+------+---------------------+
| John Liu | 4 | 13 |
| Rita Ora | 4.2 | 12 |
| Samantha Prabhu | 4.9 | 12 |
| Phellum Luis | 4.21 | 12 |
| Albert Decosta | 4 | 0 |
| Philip Mohan | 5 | 0 |
| Sam Roberts | 4.3 | 0 |
| Roni Roto | 4.44 | 13 |
| Reena Roy | 4.15 | 0 |
| Agatha Christi | 4.59 | 0 |
| Pollards Grey | 4.65 | 12 |
+-----------------+------+---------------------+
The query above retrieves each student’s name and GPA, along with the total number of course registrations. The subquery in the SELECT clause references s.id from the outer query to count registrations for each student.
Correlated subqueries are particularly efficient as they depend on the outer query for their values:
SELECT c.id,
c.name,
(SELECT COUNT(DISTINCT r.student_id)
FROM Registration r
WHERE r.course_id = c.id) AS enrolled_students
FROM Course c
WHERE c.credits > 5;
+-------+---------------------------------------------+-------------------+
| id | name | enrolled_students |
+-------+---------------------------------------------+-------------------+
| CE111 | Introduction to Structural Engineering | 1 |
| CE121 | Geotechnical Engineering-I | 1 |
| CE131 | Mechanics of Solids-I | 2 |
| CE141 | Mechanics of Fluids-I | 1 |
| CE151 | Modeling Tools for Civil Engineers | 1 |
| CE161 | Statistics for Civil Engineers | 2 |
| CE221 | Geotechnical Engineering-II | 1 |
| MA241 | Convex Optimizaton | 1 |
| ME161 | Supply Chain Management | 1 |
| ME211 | Advanced Dynamics-II | 1 |
| ME221 | Nanotechnology-II | 1 |
+-------+---------------------------------------------+-------------------+
In the example above, we counted the number of students enrolled in each course who are taking five credits or more. Furthermore, the correlated subquery was executed once for each row on the outer query.
When working with complex queries, we may need to reference tables or columns from the outer query within a subquery.
First, let’s see how we can demonstrate referencing an outer query’s table in a subquery:
SELECT s.name, s.gpa
FROM Student s
WHERE EXISTS (
SELECT 1
FROM Registration r
WHERE r.student_id = s.id
AND r.semester = 'SPRING'
AND r.year = 2023
);
+-----------------+------+
| name | gpa |
+-----------------+------+
| John Liu | 4 |
| Rita Ora | 4.2 |
| Philip Lose | 3.8 |
| Samantha Prabhu | 4.9 |
| Peter Liu | NULL |
| Ritu Raj | NULL |
| Julia Roberts | 3.04 |
| Pollards Grey | 4.65 |
| Rose Rit | 3.57 |
| Phellum Luis | 4.21 |
| Roni Roto | 4.44 |
| Piu Liu | 2.99 |
+-----------------+------+
In the example above, we retrieved students who registered for courses in the 2023 Spring semester. The subquery references s.id from the outer query to establish the correlation between students and their registrations.
MySQL, PostgreSQL, and SQL Server all support outer query references. However, we need to consider some limitations, particularly subqueries in the FROM clause.
To illustrate this, let’s consider an example below:
SELECT s.name, c.course_count
FROM Student s
JOIN (
SELECT student_id, COUNT(*) AS course_count
FROM Registration
WHERE student_id = s.id -- This reference is not allowed
GROUP BY student_id
) c ON s.id = c.student_id;
The query above attempts to count the courses for each student using a subquery in the FROM clause. However, the reference to s.id within the subquery is not allowed. This limitation exists because the subquery in the FROM clause is logically executed before the outer query, thus making outer references impossible at that stage.
Despite the limitations above, we can achieve our results using alternative approaches that work with different database systems. These solutions not only address the constraints but also demonstrate the flexibility of SQL in solving complex querying problems.
One effective cross-compatible solution is to use a JOIN operation combined with GROUP BY:
SELECT s.name, COUNT(r.id) AS course_count
FROM Student s
LEFT JOIN Registration r ON s.id = r.student_id
GROUP BY s.id, s.name;
+-----------------+--------------+
| name | course_count |
+-----------------+--------------+
| John Liu | 13 |
| Rita Ora | 12 |
| Philip Lose | 12 |
| Samantha Prabhu | 12 |
| Vikas Jain | 0 |
| Jia Grey | 0 |
| Agatha Christi | 0 |
| Julia Roberts | 12 |
| Pollards Grey | 12 |
| Potu Singh | 0 |
+-----------------+--------------+
In the query above, we joined the Student and Registration tables and then grouped the results by student. Subsequently, this approach effectively counts the courses for each student without needing a subquery in the FROM clause.
Another efficient and compatible alternative is to use a correlated subquery in the SELECT clause:
SELECT s.name,
(SELECT COUNT(*)
FROM Registration r
WHERE r.student_id = s.id) AS course_count
FROM Student s
ORDER BY course_count DESC;
+-----------------+--------------+
| name | course_count |
+-----------------+--------------+
| John Liu | 13 |
| Rose Rit | 13 |
| Roni Roto | 13 |
| Rita Ora | 12 |
| Philip Lose | 12 |
| Samantha Prabhu | 12 |
| Phellum Luis | 12 |
| Peter Liu | 12 |
+-----------------+--------------+
This method uses a correlated subquery to count registration for each student. Furthermore, the subquery was executed for each row in the outer query, allowing us to reference the outer query’s table within the subquery.
Both solutions above achieve the same results as our original query attempt, but in a way that works across different database management systems. Moreover, they showcase different SQL techniques that can be applied in various scenarios.
In this article, we began by understanding outer query references in the SELECT and WHERE clauses and why they’re useful in subqueries. We examined the limitations of referencing outer queries, particularly in FROM clause subqueries.
Finally, we explored practical alternatives using JOINs and correlated subqueries in the SELECT clause. We demonstrated how to achieve complex query requirements while maintaining database compatibility.