Skip to main content
Ben Nadel at InVision In Real Life (IRL) 2018 (Hollywood, CA) with: Juan Agustín Moyano
Ben Nadel at InVision In Real Life (IRL) 2018 (Hollywood, CA) with: Juan Agustín Moyano

Using Multiple Common Table Expressions In One SQL Query In MySQL

By
Published in Comments (10)

A couple of years ago, I upgraded this blog from MySQL 5.7.10 to MySQL 8.0.28. This gave me access to newer SQL features like Common Table Expressions (CTE) and JSON column types. But, my blog requires little more that CRUD (Create, Read, Update, Delete) operations; so, there's not much of need for me to write advanced SQL statements.

At work, however, we recently upgraded to MySQL 8. And this finally gave me an opportunity to start playing around with more advanced concepts due to the large number of one-off reports that I have to run. A couple of weeks ago, I looked at using VALUES and ROW constructs to create a common table expression. And, as a follow-up to that, I just learned that you can have multiple CTEs within a single MySQL query. And, that these CTEs can reference each other. So freaking cool!

To demonstrate, I'm going to build-up a SQL query using common table expressions. First, we'll start with a list of fake email addresses. This is something that I often have to do, copy-pasting them from a CSV (Comma Separated Values) file. The following CTE just gets the email addresses into a consumable format:

WITH
	emails ( `email` ) AS (

		VALUES
			ROW( 'laura@acme.example' ),
			ROW( 'tim@acme.example' ),
			ROW( 'anna@masters.example' ),
			ROW( 'sonya@vids.example' ),
			ROW( 'robert@acme.example' ),
			ROW( 'alice@vids.example' ),
			ROW( 'martha@chomp.example' ),
			ROW( 'douglas@chomp.example' )

	)

This creates a derived table, emails, with a single column, email. Common table expressions can be referenced by the main SQL statement; but, they can also be referenced by other CTEs in the same query. And that's exactly what we'll do next—create another CTE that builds upon the prior CTE and extracts the domain from each row:

WITH
	-- ... truncated SQL query ...
	deconstructed ( `email`, `domain` ) AS (

		SELECT
			e.email,
			SUBSTRING_INDEX( e.email, '@', -1 ) AS domain
		FROM
			emails e -- CTE reference!

	)

As you can see, this CTE is querying from the previous CTE, emails. Then, it's using the SUBSTRING_INDEX() function to parse the domain out of each email, creating yet another derived table / CTE.

Next, we'll use this new CTE to create another CTE which groups the emails by domain and records the COUNT() statistics:

WITH
	-- ... truncated SQL query ...
	stats ( `domain`, `emailCount` ) AS (

		SELECT
			d.domain,
			COUNT( * ) AS emailCount
		FROM
			deconstructed d -- CTE reference!
		GROUP BY
			d.domain

	)

As you can see, this CTE is querying from the previous CTE, deconstructed.

Next, we'll create another CTE which again references the previous CTE, deconstructed. But, this time, instead of getting the count, we'll collect the email addresses—per domain—into a JSON aggregation.

Note: I could have combined the following CTE with the previous CTE and used both the COUNT(*) and the JSON_ARRAYAGG() in the same query; but, splitting them up allowed me to explore the space a bit more.

WITH
	-- ... truncated SQL query ...
	aggregated ( `domain`, `collection` ) AS (

		SELECT
			d.domain,
			JSON_ARRAYAGG( d.email ) AS collection
		FROM
			deconstructed d -- CTE reference!
		GROUP BY
			d.domain

	)

So far, we've only created CTEs that reference other CTEs. But, these CTEs merely setup utility tables, they don't actually return data to the client. Now it's time to write the actual SQL that returns actual data. This SQL will combine multiple CTEs from above:

WITH
	-- ... truncated SQL query ...
SELECT
	s.domain,
	s.emailCount,
	a.collection
FROM
	stats s -- CTE reference!
INNER JOIN
	aggregated a -- CTE reference!
ON
	a.domain = s.domain
ORDER BY
	s.emailCount DESC,
	s.domain ASC
;

As you can see, we're taking two of the CTEs, stats and aggregated, and we're JOINing them together.

It's just CTEs all the way down! We've looked at the individual parts in isolation. Now, here's the entire SQL query in one query:

WITH
	-- First, we'll start with a common table expression (CTE) for the email addresses.
	emails ( `email` ) AS (

		VALUES
			ROW( 'laura@acme.example' ),
			ROW( 'tim@acme.example' ),
			ROW( 'anna@masters.example' ),
			ROW( 'sonya@vids.example' ),
			ROW( 'robert@acme.example' ),
			ROW( 'alice@vids.example' ),
			ROW( 'martha@chomp.example' ),
			ROW( 'douglas@chomp.example' )

	),
	-- Second, we'll create a common table expression (CTE) that extracts the domain from
	-- the email address.
	deconstructed ( `email`, `domain` ) AS (

		SELECT
			e.email,
			SUBSTRING_INDEX( e.email, '@', -1 ) AS domain
		FROM
			emails e -- CTE reference!

	),
	-- Third, we'll create a common table expression (CTE) that provides some stats for
	-- how many emails belong to each domain.
	stats ( `domain`, `emailCount` ) AS (

		SELECT
			d.domain,
			COUNT( * ) AS emailCount
		FROM
			deconstructed d -- CTE reference!
		GROUP BY
			d.domain

	),
	-- Fourth, we'll create a common table expression (CTE) that groups the emails by
	-- domain and provides the collection of emails as a JSON payload.
	aggregated ( `domain`, `collection` ) AS (

		SELECT
			d.domain,
			JSON_ARRAYAGG( d.email ) AS collection
		FROM
			deconstructed d -- CTE reference!
		GROUP BY
			d.domain

	)
SELECT
	s.domain,
	s.emailCount,
	a.collection
FROM
	stats s -- CTE reference!
INNER JOIN
	aggregated a -- CTE reference!
ON
	a.domain = s.domain
ORDER BY
	s.emailCount DESC,
	s.domain ASC
;

And, when we run this MySQL query, we get the following output:

Navicat SQL client showing a partial SQL query and the results of a JOIN product of multiple common table expressions in MySQL.

When it comes to writing basic business applications, I don't need these sassy SQL mechanics. But, when it comes to reporting and other types of one-off data introspection tasks, features like VALUES/ROW, iteratively enhanced common table expressions, and JSON aggregations are just amazing! SQL is a truly wonderful language.

Want to use code from this post? Check out the license.

Reader Comments

2 Comments

One powerful use of CTEs in queries is to recursion. For example, if you have a table that has some kind of hierarchy, you can use CTEs to walk up or down the hierarchy from a defined starting point.

15,848 Comments

@Carl,

I've seen the recursive feature in the docs, but I've never actually tried it myself. The syntax looks a little strange; but, I imagine it makes some complex data relationships possible to query.

Historically, when I have a parent-child relationship to pull back, I'll just pull back all the "nodes" back, and then build the relationship in the ColdFusion code itself. I should take one of those scenarios and try to rework it use SQL recursion to see how it feels.

14 Comments

Just checking that this is a 'forced example on purpose', given the same result can be achieved with a simple GROUP BY on the calculated domain column;

WITH
emails ( 'email' ) AS (
    VALUES
        ROW( 'laura@acme.example' ),
        ROW( 'tim@acme.example' ),
        ROW( 'anna@masters.example' ),
        ROW( 'sonya@vids.example' ),
        ROW( 'robert@acme.example' ),
        ROW( 'alice@vids.example' ),
        ROW( 'martha@chomp.example' ),
        ROW( 'douglas@chomp.example' )

)
SELECT
    SUBSTRING_INDEX( e.email, '@', -1 ) AS domain,
    count(*) AS emailCount,
    JSON_ARRAYAGG( e.email ) AS collection
FROM
    emails e 
GROUP BY
    domain
;

+-----------------+------------+-------------------------------------------------------------------+
| domain          | emailCount | collection                                                        |
+-----------------+------------+-------------------------------------------------------------------+
| acme.example    |          3 | ["laura@acme.example", "tim@acme.example", "robert@acme.example"] |
| chomp.example   |          2 | ["martha@chomp.example", "douglas@chomp.example"]                 |
| masters.example |          1 | ["anna@masters.example"]                                          |
| vids.example    |          2 | ["sonya@vids.example", "alice@vids.example"]                      |
+-----------------+------------+-------------------------------------------------------------------+
15,848 Comments

@Ian,

I will grant that my example is more contrived than is necessary, in order to show that multiple CTEs can be used and chained together. That said, I do think that some degree of breaking things apart makes it more readable / understandable.

For example, in your query, it almost feels like a chicken-and-egg problem. Your GROUP BY depends on the calculation of domain in the SELECT. But then, your COUNT(*) depends on the grouping of the domains... which depends on the SELECT... which depends on the GROUP BY, and so on 😄

I'm actually a little surprised that is valid (which it clearly does given your output).

So, long story short, yes, it's more contrived than it needed to be; but, not as contrived as I think you think it is. But, it's certainly a matter of subjectivity.

14 Comments

@Ben Nadel,

The reason it 'works' in MySQL (this is a MySQL specific article) is that MySQL doesn't insist you include all the non-aggregate columns in the GROUP BY. My query wouldn't work in e.g. Postgres, as Postgres would insist I also included e.email in my GROUP BY, which would stop it grouping at the domain level.

We're already in the 'you're not porting that puppy to A.N. Other RDMS without some headscratching' territory, so that little detail aside...

My select is just a bog-standard 'column plus aggregates'. OK, that column is 'calculated', but it's a simple row level calc.

So, I'm not suprised that you're suprised it works, but I am suprised, given it does, that you think it's subjective how close/far apart these two queries are in 'understandability'. I think anyone (with any SQL knowledge!) would immediately see what my query is doing (even if they too were suprised that it works). For all its explicatory pluses, yours is not a 'glance and I've got it' query!

Hey ho, on we go...

15,848 Comments

@Ian,

Totally, at a glance, it's not too hard to understand what it is intending to do. I think one can easily pattern-match on the GROUP BY + COUNT pattern and not actually think too hard about the underlying mechanics.

That said, I've often written SQL that I think should work, only be yelled at by the query analyzer 🙃

14 Comments

@Ben Nadel,

Mine is a query I'd hope would work, as for me it's the most direct/understable conversion from English to SQL, given the output requirements. And I'd point out the (easily inferred) intention matches the result!

In other universe, version 8.5 of MySQL removes the requirement to include all non-aggregate columns in a GROUP BY and I'm writing an article about how you can now get rid of all those obfuscatory CTEs😝

15,848 Comments

@Ian,

Oh very cool - please cross-post it here when you have something to read. 🙌

Another thought I just had, and this is a general one about CTEs, not necessarily related to this specific query. I would guess that a CTE makes it easier to debug a query because it gives you an intermediary result that you can output (ie, SELECT on). I draw this parallel to regular programming where having intermediary variables makes break-points and debugging a bit easier to manage.

14 Comments

@Ben Nadel,

sorry Ben - my article comment was tongue in cheek.

I was imagining a universe where MySQL's ability to not have to include all non-aggregate columns in the GROUP BY didn't exist currently, and was then brought into an upcoming version. Thereby allowing calculated columns to work in GROUP BY clauses, and thus negating the need for CTEs in your original query, and allowing my query to work.

So, I'd be treating this as a great new feature, in a way you're doing with CTE's here. I was, in short, 'doing an irony'

15,848 Comments

@Ian,

Ha ha, I can dig it - I've got Monday brain :) That said, I do absolutely love that MySQL doesn't require all columns to be in the GROUP BY. I came originally from a Microsoft SQL Server as my first DB tech, which needed all the columns. The relaxed nature of MySQL, in that regard, is so much nicer.

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel