Goodbye GROUP_CONCAT(), Hello JSON_ARRAYAGG() And JSON_OBJECTAGG() In MySQL 5.7.32
The other day, when aggregating some SQL data for a Customer Success report, I went to look up the semantics for the GROUP_CONCAT()
function in MySQL. I rarely use the GROUP_CONCAT()
function since I've had several problems with it in the past, from failing silently to it bringing back incorrect data (which I believe was a bug in the version of MySQL I was using at the time). As such, I haven't committed the function signature to memory. That said, when I got to the documentation, I noticed two other methods documented right next to it: JSON_ARRAYAGG()
and JSON_OBJECTAGG()
. I had never seen these before; but, oh chickens, these MySQL functions seem awesome for group aggregation. I'll definitely be using these instead of GROUP_CONCAT()
going forward in my MySQL 5.7.32 reporting.
A while back, I mentioned that we were upgrading to MySQL 5.7.32 at work. And, that I was excited to see that such an upgrade would introduce the JSON (JavaScript Object Notation) column-type and relevant manipulation functions. But somehow, when I was learning about the JSON support in MySQL, I missed the fact two new aggregation functions were added as well:
JSON_ARRAYAGG( column )
JSON_OBJECTAGG( key_column, value_column )
Just like MIN()
, MAX()
, and GROUP_CONCAT()
, the JSON aggregation functions allow us to extract information from a set of rows that have been collapsed through row-grouping (ex, using GROUP BY
). Only, they allow us to extract that information into higher-level data structures, not just delimited strings.
The JSON_ARRAYAGG()
function gathers all the values in a given column and returns then in a single, aggregated JSON Array.
The JSON_OBJECTAGG()
function allows us to gather data from two columns and returns them in a single, aggregated JSON Object. Within this JSON Object, one column defines the key and one column defines the value. Each row within the grouping generates a key-value pair to be added to the aggregate Object.
With SQL like this, the easiest way to understand it is to see an example. So, let's consider the database schema for my blog. I have "blog entries" and I have "tags" (ex, "ColdFusion", "SQL", "JavaScript"). There exists a many-to-many relationship between blog entries and tags that get codified using a "join table":
blog_entry
tag
blog_entry_tag_jn
- imagine a two-column table that does nothing but hold the primary keys from the other two tables in the many-to-many relationship.
Given this high-level schema, let's look at how I might query for blog entries and - for each blog entry record - pull back information about the tags associated with that entry. I'm going both the JSON_ARRAYAGG()
and JSON_OBJECTAGG()
functions in the same query so that we can compare the two results:
SELECT
e.id,
e.name,
-- Pull back all the tag names a single array. Each tag will be an item
-- within the array.
JSON_ARRAYAGG( t.name ) AS tagNames,
-- Create an object with the schema { tag.id : tag.name }. Each tag ID will
-- be a key within the object.
JSON_OBJECTAGG( t.id, t.name ) AS tagIndex
FROM
blog_entry e
INNER JOIN
blog_entry_tag_jn jn -- Our many-to-many join table.
ON
jn.blog_entry_id = e.id
INNER JOIN
tag t
ON
t.id = jn.tag_id
-- Since we're GROUPING on the blog entry records, all of the `INNER JOIN` tag
-- information is going to be collapsed. However, we can extract aggregation
-- information about the tags using our JSON functions above!
GROUP BY
e.id
HAVING
COUNT( * ) > 1 -- To make the grouping more exciting!
ORDER BY
e.id DESC
LIMIT
10
Since we're using a GROUP BY
on the blog entries, all of the many-to-many tag information is getting collapsed down into a grouping. However, we can then extract that grouping information on a per-entry-row basis by using the JSON aggregation functions. And, when we run this SQL code, we get the following results:
As you can see, the JSON_ARRAYAGG()
function gathers each tag name
and pushes it onto a single array. And, the JSON_OBJECTAGG()
function gathers each tag id
and name
and appends them to a single object (using the id
column as the key and the name
column as the value). How sweet is this!
Now that we have JSON (JavaScript Object Notation) aggregation functions in MySQL 5.7 (added in 5.7.22), I can't think of a reason that I'd want to use the GROUP_CONCAT()
function anymore. It seems that having structured data is always better than having a delimited list. But, I did just stumble upon these functions, so maybe there are use-cases I haven't considered yet.
Epilogue on JSON Aggregation Limitations
Since the JSON aggregation functions roll-up many rows into a single value, I have to assume that there are limits to the amount of data that it can return (just as there is with the GROUP_CONCAT()
function). But, I'm having trouble finding anything explicitly documented. I assume that the aggregate data is still bound by the max_allowed_packet
setting. But, I'm not sure if there are any other settings which may further reduce the amount of data that can be returned in each aggregation. It's just something to keep in mind.
Want to use code from this post? Check out the license.
Reader Comments
Very interesting! I'm not sure I see a use case for the
JSON_OBJECTAGG()
function, but I could definitely see howJSON_ARRAYAGG()
could replace theGROUP_CONCAT()
.I use
GROUP_CONCAT()
to pull back a list of users in my app. I'd love to hyperlink each user such that clicking that user emails them or goes to their profile. I haven't spent much time trying to solve this problem. I've only found a way to return the name (so far). When I tried to return the email to hyperlink each name, I ran into trouble. I don't recall the trouble anymore. That was a while back.@Chris,
I'm still getting my head wrapped around this stuff, but speaking to your user-email problem, I think you could so something like this:
This would return an object in which the
email
is the key and thename
is the value. So, something like:This way, you can pull back both the name and the email, which is what is sounds like you need to link in your example. But, I don't really know your context; and, I only just learned about these functions - so your mileage may vary 😉
This is really exciting, I have been doing a few things with JSON in MySQL 5.7 but assumed most of the JSON functions were only in MySQL 8 so I have been doing everything in ColdFusion/javascript which is super limiting and slow. After looking at the docs, the number of JSON functions available after 5.7.23 are quite extensive.😀
@Scott,
Yeah, I was excited to see the JSON functions sneaking in there! I actually just upgraded the MySql engine for this site to 8.x. But, I don't even know what is new there yet - I gotta carve out some time to look at the changelog.
This gave me an idea for a new QoQ feature in Lucee. It would be pretty easy to make Lucee's native QoQ do this sort of thing.
https://luceeserver.atlassian.net/browse/LDEV-3895
@Brad,
Very cool!! I have to say, you're single-handedly making query-of-queries fashionable again 😮 what with your speed improvements and aggregation ideas. I think people forget just how powerful Query objects are 💪
Literally just hit another issue in my code from using
group_concat
due to length truncation. Glad this post was written so I have an alternative solution. Thanks! 😀@Scott,
I wish there was a
:shakes-fist:
emoji! We have one at work - Grandpa Simpson shaking fist at cloud (or something). I wonder what it would take to get some custom emoji working here ... eh, probably a bad idea.But, that's neither here nor there - excited to see that you have a path forward now!
@Ben,
During exploring I got even more excited with using the
JSON_OBJECT
function that easily lets me generate grouped JSON output:output serializeJSON:
@Scott,
That's something I need to play around with more. I know that they exist; but, I haven't really started to use them in any way. I wonder if that would be an interesting way to return data across multiple tables in a
JOIN
. So, instead of mapping sets of columns for each table, return a JSON object for each table. Meaning, instead of this:... you have something like this:
... though, looking at it, the later is definitely more noisy. But, something to let soak in the back of my brain.
@Ben,
It depends on the use case, for me I need to use it in javascript so previously I was generating my structs in ColdFusion, now that the entire process is automatically created by the database engine, including typecasting and character escaping (I'm assuming), making the process faster and seamless.
The other wonderful thing in the more recent versions of 5.7 is the JSON column and its ability to query/filter by specific keys. Until then I have been storing things as
longtext
when needing to store JSON in the database and its a bit nerveracking 😳I finally did my own follow-up on nesting JSON functions inside the
JSON_ARRAYAGG()
andJSON_OBJECTAGG()
aggregation functions as identified by Scott Steinbeck:www.bennadel.com/blog/4350-nesting-json-functions-inside-json-arrayagg-and-json-objectagg-in-mysql-5-7-38.htm
This is some pretty cool stuff!
Just wondering if there has been any performance comparisons done between JSON_ARRAYAGG and GROUP_CONCAT?
@Mike,
I haven't personally done any performance comparisons. But, I also haven't used this technique on any volume of data where performance would become a problem.
@Mike,
I would assume GROUP_CONCAT is faster, but just be aware, there's a hard limit on how many results GROUP_CONCAT can return that is not the case in JSON_ARRAYAGG I don't believe
@Scott,
Oh man, I've been burned by that hard in the past! We had a SQL query where the
GROUP_CONCAT()
was just silently truncating values. We were then using that list of truncated values as input into another query ... and imagine that when the truncation happened, it was splitting some INT values in half ... 😱@Ben,
Yikes 😱, it is definitely a limit that should either not exist or not fail silently. I have had a similar experience where I was sending out text notifications to a list of users and about 20% weren't receiving anything. It was really hard to track down.
@Scott,
Ooof, I feel your pain 😰
Hi Ben,
thanks a lot for your awesome article about JSON on MySQL (and MariaDB)!
I spent many hours before find that informations.
👍👍👍
@Stefano,
Awesome, so glad this was helpful 🙌
Thanks for this useful post! Just a question: is there a way to convert JSON_OBJECTAGG to an associative array using only MySQL (no PHP)? I mean something like this:
from
to
@Tom,
I'm not too familiar with the semantics of PHP; but, what is the difference between an Object and an Associative Array; I kind of thought those were actually the same thing?
No, they're not the same thing.
In PHP, to iterate an object, you need to convert it in an associative array, this way:
What I need to know is if - using MySQL 8 - there's a way to get an associative array directly from a SQL query, avoiding the PHP step.
@Tom,
Ah, ok, I think I see what you're asking now. The column value coming out of the MySQL database is a string, even when the column type is defined as
json
; or, when you use something likeJSON_ARRAYAGG()
orJSON_OBJECTAGG()
to collect data. At the end of the workflow, what the database gives you is a string representation of the serialized object(s).As far as I know, you will always have to decode the stringified data into a native object in your specific language (like PHP, JavaScript, etc).
That said, I'm not a database expert; so, it's possible that there are pathways that I don't understand very well. For example, the MySQL 8x version has an alternate API for storing document data called the
X Dev API
. I've only played around with it a tiny bit; but, I think it will give you native objects, not just serialized values. But, I'm just shooting in the dark.@Ben Nadel,
Thanks! I'm not a database expert, too. But today your post gave me the occasion to learn something new: JSON_OBJECTAGG's key cannot be null, otherwise you get the error "PDOException: SQLSTATE[22032]: <>: 3158 JSON documents may not contain NULL member names". This can be a problem in case of a blog_entry without any associated tags. It doesn't happen using GROUP_CONCAT().
@Tom,
Learning is what's we're here for -- glad that at least there was some inspiration shared here :)
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →