Performing A LEFT OUTER JOIN On An INNER JOIN In Order To Write More Expressive SQL In MySQL 5.6.37
Over the last week, I've been working with one of our senior data scientists, Kaitlin Seders, to generate reports from our MySQL and Redshift databases. It's been both thrilling and exhausting - when you think that you know your SQL pretty well, try sitting down with a data scientist and prepare to have your ego adjusted! During our time together, I've been using a lot of SQL techniques that I don't ordinarily use. For example, I used SELECT
and UNION
to create derived data-sets in order to JOIN
CSV data with table data. Another SQL feature that I whipped out was the ability to perform a LEFT OUTER JOIN
on an INNER JOIN
in MySQL. The syntax for this is a bit strange; but, it allows you to write queries that more accurately describe your intent.
To demonstrate this technique, let's create the following friend
table:
mysql> SELECT * FROM friend;
+----+--------+-----+----------------------------------------------+
| id | name | age | catchPhrase |
+----+--------+-----+----------------------------------------------+
| 1 | Tina | 39 | If it ain't broke, try harder. |
| 2 | Danny | 28 | |
| 3 | Ellen | 50 | Whatchamhoozy. |
| 4 | Ralph | 8 | Cookies! |
| 5 | Sandi | 33 | Don't forget to stop and smell the chickens! |
| 6 | Joanna | 42 | Yep yep yep/ |
| 7 | Cole | 30 | It ain't over till it's over. |
+----+--------+-----+----------------------------------------------+
7 rows in set (0.00 sec)
And, to give us something interesting to JOIN
against, let's create a friend_relationship
table that models inter-friend relationships that are either benevolent (feeling='friend'
) or truculent (feeling='enemy'
):
mysql> SELECT * FROM friend_relationship;
+----+-------------+-------------+---------+
| id | friendIdOne | friendIdTwo | feeling |
+----+-------------+-------------+---------+
| 1 | 1 | 3 | friend |
| 2 | 1 | 5 | friend |
| 3 | 5 | 1 | enemy |
| 4 | 2 | 4 | friend |
| 5 | 4 | 2 | friend |
| 6 | 5 | 2 | friend |
+----+-------------+-------------+---------+
6 rows in set (0.00 sec)
Given these two tables, imagine that we want to pull back the list of friends alongside the list of inter-friend friendships. In other words, we want to see all the friends plus the friends that those friends are friends with. When doing this, we know two things:
Not all
friend
records have correspondingfriend_relationship
records because not all friends know each other.All
friend_relationship
records must correspond to records in thefriend
table since thefriend_relationship
table is basically a glorified "join" table with foreign-key references.
To translate that into "join intent", we want:
friend
==> LEFT OUTER JOIN
==> friend_relationship
And,
friend_relationship
==> INNER JOIN
==> friend
A naive attempt to codify this in a SQL query might look like this:
SELECT
f.id,
f.name,
f.age,
( otherFriend.id ) AS friend_id,
( otherFriend.name ) AS friend_name,
( otherFriend.age ) AS friend_age
FROM
friend f
-- Since NOT ALL friends are going to have relationships with each other, we need
-- to perform a LEFT JOIN so as not to reduce the records from the first table.
LEFT OUTER JOIN
friend_relationship r
ON
(
r.friendIdOne = f.id
AND
r.feeling = 'friend'
)
-- Get all the Friend records that correspond to the RIGHT side of the relationship.
-- CAUTION: The INNER JOIN here does NOT WORK as you might expect!
INNER JOIN
friend otherFriend
ON
otherFriend.id = r.friendIdTwo
ORDER BY
f.name ASC,
otherFriend.name ASC
;
Here, we are trying to capture the two joins as previously articulated - the first one being a LEFT OUTER JOIN
and the second one being an INNER JOIN
. Unfortunately, this does not work. The INNER JOIN
ends up limiting the result-set since it speaks to the final cross-product. As such, when we run the above SQL query in MySQL, we get the following output:
+----+-------+-----+-----------+-------------+------------+
| id | name | age | friend_id | friend_name | friend_age |
+----+-------+-----+-----------+-------------+------------+
| 2 | Danny | 28 | 4 | Ralph | 8 |
| 4 | Ralph | 8 | 2 | Danny | 28 |
| 5 | Sandi | 33 | 2 | Danny | 28 |
| 1 | Tina | 39 | 3 | Ellen | 50 |
| 1 | Tina | 39 | 5 | Sandi | 33 |
+----+-------+-----+-----------+-------------+------------+
5 rows in set (0.00 sec)
We have 7 friends in the friend
table. However, this SQL query only brought back 5 friends. That's because the INNER JOIN
on the latter two tables "accidentally" filtered-out the friends who have no relationship with each other.
One "fix" for this is to simply convert the INNER JOIN
to a LEFT OUTER JOIN
:
SELECT
f.id,
f.name,
f.age,
( otherFriend.id ) AS friend_id,
( otherFriend.name ) AS friend_name,
( otherFriend.age ) AS friend_age
FROM
friend f
-- Since NOT ALL friends are going to have relationships with each other, we need
-- to perform a LEFT JOIN so as not to reduce the records from the first table.
LEFT OUTER JOIN
friend_relationship r
ON
(
r.friendIdOne = f.id
AND
r.feeling = 'friend'
)
-- Get all the Friend records that correspond to the RIGHT side of the relationship.
-- However, since we already on the other side of a LEFT OUTER JOIN, we can use
-- another LEFT OUTER JOIN so as to not to accidentally limit the cross-product from
-- the previous JOIN.
LEFT OUTER JOIN
friend otherFriend
ON
otherFriend.id = r.friendIdTwo
ORDER BY
f.name ASC,
otherFriend.name ASC
;
When we change the INNER JOIN
to a LEFT OUTER JOIN
and run this SQL query in MySQL, we end up with the records that we wanted:
+----+--------+-----+-----------+-------------+------------+
| id | name | age | friend_id | friend_name | friend_age |
+----+--------+-----+-----------+-------------+------------+
| 7 | Cole | 30 | NULL | NULL | NULL |
| 2 | Danny | 28 | 4 | Ralph | 8 |
| 3 | Ellen | 50 | NULL | NULL | NULL |
| 6 | Joanna | 42 | NULL | NULL | NULL |
| 4 | Ralph | 8 | 2 | Danny | 28 |
| 5 | Sandi | 33 | 2 | Danny | 28 |
| 1 | Tina | 39 | 3 | Ellen | 50 |
| 1 | Tina | 39 | 5 | Sandi | 33 |
+----+--------+-----+-----------+-------------+------------+
8 rows in set (0.00 sec)
As you can see, we got back all 7 friends in the friend
table, including those that have no inter-friend relationship. And, of course, we got back multiple rows for friends (Tina
) that have a relationship with several other friends.
Chaining two LEFT OUTER JOIN
together works. But, it feels gross since we've fundamentally changed the intent of the relationships between the various tables. Were someone else to come and look at this SQL query, seeing the second LEFT OUTER JOIN
would indicate to them that we may have friend_relationship
records that have no corresponding friend
records. Of course, that's never going to be the case - that was just something we did to satisfy the query.
To get the same results - but maintain the intent of the query - we can perform the LEFT OUTER JOIN
on the product of an INNER JOIN
:
SELECT
f.id,
f.name,
f.age,
( otherFriend.id ) AS friend_id,
( otherFriend.name ) AS friend_name,
( otherFriend.age ) AS friend_age
FROM
friend f
-- Since NOT ALL friends are going to have relationships with each other, we need
-- to perform a LEFT JOIN so as not to reduce the records from the first table.
LEFT OUTER JOIN
(
-- Get all the Friend records that correspond to the RIGHT side of the
-- relationship. And, while we need to use a LEFT OUTER JOIN on the first
-- table, we know that each joined record MUST CORRESPOND to a row from the
-- friend table. As such, we can use an INNER JOIN inside of our LEFT OUTER
-- JOIN in order to treat the following tables as a "unit".
friend_relationship r
INNER JOIN
friend otherFriend
ON
(
r.feeling = 'friend'
AND
otherFriend.id = r.friendIdTwo
)
)
ON
r.friendIdOne = f.id
ORDER BY
f.name ASC,
otherFriend.name ASC
;
With this syntax, the joins now read exactly the way we intended them to work: the friend
table may or may not correspond to the friend_relationship
table; but, every record returned from friend_relationship
must correspond to a subsequent record in the friend
table.
And, when we run this SQL query in MySQL, get the following output:
+----+--------+-----+-----------+-------------+------------+
| id | name | age | friend_id | friend_name | friend_age |
+----+--------+-----+-----------+-------------+------------+
| 7 | Cole | 30 | NULL | NULL | NULL |
| 2 | Danny | 28 | 4 | Ralph | 8 |
| 3 | Ellen | 50 | NULL | NULL | NULL |
| 6 | Joanna | 42 | NULL | NULL | NULL |
| 4 | Ralph | 8 | 2 | Danny | 28 |
| 5 | Sandi | 33 | 2 | Danny | 28 |
| 1 | Tina | 39 | 3 | Ellen | 50 |
| 1 | Tina | 39 | 5 | Sandi | 33 |
+----+--------+-----+-----------+-------------+------------+
8 rows in set (0.00 sec)
As you can see, this LEFT OUTER JOIN
on the product of an INNER JOIN
gives us the same results as the chained LEFT OUTER JOIN
approach. So, we get the records that we wanted while much more clearly expressing the intent of the query.
One thing to note about this approach is that the inner INNER JOIN
can only reference tables within that inner join. If you need to reference another table, from a previous join, you have to do so in the outer ON
clause (of the "product" and the previous tables).
Isn't SQL just thrilling?! What a beautifully expressive language. It's no surprise to me at all that it's stood the test of time.
Want to use code from this post? Check out the license.
Reader Comments