Query Exercise: Find Foreign Key Problems

Let’s say we’ve decided to implement foreign keys, and we need to find data that’s going to violate our desired keys.

We’re going to use the Stack Overflow database, and we’ll focus on these 3 tables:

users table: with id column as its primary key
posts table: with owneruserid column noting which users.id wrote the post
comments table: with userid column noting which users.id wrote the comment, and postid column noting which post.id is being commented on

Before we attempt to implement foreign keys, we need to find data which might violate the foreign key relationships. Are there any:

posts rows whose owneruserid does not match up with a valid users.id
comments rows whose userid doesn’t match up with a valid users.id
comments rows whose postid doesn’t match up with a valid posts.id
And to make your task easier, let’s focus on just the first 100K rows in each table (rows with an id <= 100000) to see whether or not foreign keys make sense for this database

Your query exercise has a few parts:

Write one or more queries to find these answers as quickly as possible with low load on the database.
Given what you find, hypothesize about what might have caused the foreign key problems.
Given what you learned, are there any changes you want to make to the app, processes, or database?

You can post your answers in this blog post’s comments, and discuss each others’ ideas. You can also use <pre></pre> tags in your comments to frame code, like this:

<pre>
select stuff from tables
</pre>

We’ll revisit your answers in this post. No peeking! Have fun solving it first.

3 Comments. Leave new

Michael J Swart
January 8, 2024 7:58 am
Here’s my go:
```
select count(*) as number_of_orphaned_posts
from (
    select owneruserid from public.posts where owneruserid < 100000
    except
    select id from public.users where id < 100000
) as a;

select count(*) as number_of_orphaned_comments
from (
    select userid from public.comments where userid < 100000
    except
    select id from public.users where id < 100000
) as a;

select count(*) as number_of_comments_under_orphaned_posts
from (
    select p.owneruserid
    from public.comments c
    inner join public.posts p
        on c.postid = p.id 
    where p.owneruserid < 100000

    except

    select id from public.users where id < 100000
) as a;
```
The first two queries didn’t find any rows that would violate a candidate foreign key. I either have a mistake, or there just aren’t any. The third query didn’t complete any time soon. I haven’t tuned this query or looked into how much load it’s putting on the server. I did an analyze on the query, but I don’t know where to go from there.

Had there been any FK violations, I assume it’s because a user is deleted without cleaning up their posts.
If the intent is to keep posts available to read when users leave (with a [deleted] author name for example), then I’d suggest implementing a soft-delete pattern, where the user-record sticks around. Add a “date-deleted” column to the users table and hide the user name from such comments.

Loading...
Reply
- Brent Ozar
  January 8, 2024 8:00 am
  
  Ah! Sorry, I should have been more specific – when reviewing the posts and comments table, only look at ids 100000 and below.
  
  Loading...
  
  Reply
How the Company-Startup Thing Worked Out For Me in 2024 – Brent Ozar
December 25, 2025 4:41 pm

[…] Query exercises with a challenge one week, followed by answers the next week […]

Loading...

Reply

Query Exercise: Find Foreign Key Problems

3 Comments. Leave new

Leave a Reply Cancel reply