MySQL GROUP_CONCAT() Has Buggy Interaction With UNION Clauses
So, earlier this week, I blogged about the unfortunate way in which MySQL's GROUP_CONCAT() function will fail silently when you hit its size limit. Well, it turns out that GROUP_CONCAT() is even more unfortunate. After some testing, it looks like GROUP_CONCAT() fails even harder (and more mysteriously) when you use it in conjunction with a UNION or UNION ALL clause.
To see what I mean, I'm going to revamp my previous demo in which I am grouping transaction IDs by day of the week. This time, however, I'm going to run the report twice. The first time, I'm using the same code as earlier; the second time, I'm gathering each day of the week individually, as an aggregate, and then joining all the individual aggregates using UNION ALL clauses:
<!--- | |
First, let's populate the audit-log for testing. Here, we're doing to store the day | |
of the week and the transaction ID that took place. | |
---> | |
<cfquery name="populate" datasource="testing"> | |
TRUNCATE TABLE audit_log; | |
<cfset transactionIndex = 100000 /> | |
<!--- For each day of the week, add 1,000 transactions. ---> | |
<cfloop index="dayIndex" from="1" to="7" step="1"> | |
<cfloop index="i" from="1" to="1000" step="1"> | |
INSERT INTO audit_log | |
( | |
dayOfWeek, | |
transactionID | |
) VALUES ( | |
#dayIndex#, | |
#( ++transactionIndex )# | |
); | |
</cfloop> | |
</cfloop> | |
</cfquery> | |
<!--- ----------------------------------------------------- ---> | |
<!--- ----------------------------------------------------- ---> | |
<!--- ----------------------------------------------------- ---> | |
<!--- | |
Now, let's report on the transactions that have taken place. Grouping the log by the | |
day of the week, we're going to concatenate the transactionIDs into a grouped-list. | |
---> | |
<cfquery name="report" datasource="testing"> | |
SELECT | |
dayOfWeek, | |
<!--- Collapse the transaction IDs into a comma-delimited list. ---> | |
GROUP_CONCAT( transactionID ) AS transactionList | |
FROM | |
audit_log | |
GROUP BY | |
dayOfWeek | |
ORDER BY | |
dayOfWeek ASC | |
</cfquery> | |
<!--- Output report aggregates. In "theory", each day should have 1,000 items. ---> | |
<cfoutput query="report"> | |
Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br /> | |
</cfoutput> | |
<!--- ----------------------------------------------------- ---> | |
<!--- ----------------------------------------------------- ---> | |
<br /> | |
<!--- ----------------------------------------------------- ---> | |
<!--- ----------------------------------------------------- ---> | |
<!--- | |
This time, rather than grouping the entire record-set by day, we're going to select | |
each aggregate individually by day of the week, then UNION these aggregates together. | |
In theory, this should be the exact same result since we're still groupging all | |
transactions by day of the week. | |
---> | |
<cfquery name="report" datasource="testing"> | |
( | |
SELECT | |
dayOfWeek, | |
GROUP_CONCAT( transactionID ) AS transactionList | |
FROM | |
audit_log | |
WHERE | |
dayOfWeek = 1 <!--- Get only this day of the week. ---> | |
GROUP BY | |
dayOfWeek | |
) | |
<!--- UNION the other 6 days of the week. ---> | |
<cfloop index="i" from="2" to="7" step="1"> | |
UNION ALL | |
( | |
SELECT | |
dayOfWeek, | |
GROUP_CONCAT( transactionID ) AS transactionList | |
FROM | |
audit_log | |
WHERE | |
dayOfWeek = #i# <!--- Get only this day of the week. ---> | |
GROUP BY | |
dayOfWeek | |
) | |
</cfloop> | |
ORDER BY | |
dayOfWeek ASC | |
</cfquery> | |
<!--- Output report aggregates. ---> | |
<cfoutput query="report"> | |
Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br /> | |
</cfoutput> | |
<!--- ----------------------------------------------------- ---> | |
<!--- ----------------------------------------------------- ---> | |
<br /> | |
<!--- ----------------------------------------------------- ---> | |
<!--- ----------------------------------------------------- ---> | |
<!--- | |
As a final sanity check, let's look at just one of the groupings that we used in | |
the previous exammple, to see how it behaves when we don't use a UNION clause. | |
---> | |
<cfquery name="report" datasource="testing"> | |
SELECT | |
dayOfWeek, | |
GROUP_CONCAT( transactionID ) AS transactionList | |
FROM | |
audit_log | |
WHERE | |
dayOfWeek = 1 <!--- Get only this day of the week. ---> | |
GROUP BY | |
dayOfWeek | |
</cfquery> | |
<!--- Output report aggregates. ---> | |
<cfoutput query="report"> | |
Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br /> | |
</cfoutput> |
In theory, these two reports should show the exact same result. Yet, when we run the code, we get the following output:
Day Of Week (1): 147
Day Of Week (2): 147
Day Of Week (3): 147
Day Of Week (4): 147
Day Of Week (5): 147
Day Of Week (6): 147
Day Of Week (7): 147Day Of Week (1): 49
Day Of Week (2): 49
Day Of Week (3): 49
Day Of Week (4): 49
Day Of Week (5): 49
Day Of Week (6): 49
Day Of Week (7): 49Day Of Week (1): 147
As you can see, when we gather each day of the week individually, in the context of a UNION ALL, GROUP_CONCAT() is returning an even shorter list (49 items). Of course, if we do the same thing without a UNION ALL (the final sanity check), we get the expected 147 item truncation.
What the heck is going on here? I feel like I'm taking crazy pills! This makes no sense at all. Not only does GROUP_CONCAT() fail silently, it also doesn't seem to fail consistently. This pretty much eliminates it as a viable option for me.
NOTE: I am running MySQL 5.6.15.
Want to use code from this post? Check out the license.
Reader Comments
Question! And this is only I am too lazy right now to try it myself.... Instead of using 1000, would you mind trying 147 and 49, and posting the results of the UNION query?
@Tanyar,
If I do 147, then I still get 147 in the first group (since that's all there is). And, more importantly, I still get 49 in the UNION-based select. So, that's one's still problematic even if I am not truncating my result set.
Oddly I get 147 throughout guess its into mysql settings their though because my version is only a little off yours 5.6.14
Guess that just makes it worse :)
I'm getting 147 throughout using MariaDB 10.0.x (which matches MySQL 5.5/6). I use GROUP_CONCAT quite a bit, including with UNION statements, and it's always seemed pretty reliable.