Using GREATEST(), LEAST(), And Date/Time Values With COALESCE() In MySQL
The other day, I needed to find the largest and smallest date/time values across a collection of tables in MySQL. This gave me an opportunity to use MySQL's GREATEST() and LEAST() functions for the first time; however, I found out that these two functions return NULL if any argument is NULL. To cope with this, I had to pull COALESCE() into the mix and then jump through a few little hoops.
Both the GREATEST() and LEAST() functions can take 2-N arguments and return the greatest and least value, respectively. However, if any of the arguments is NULL, the entire result is NULL regardless of what the rest of the arguments hold. To get around this, I had to use COALESCE() to ensure that no argument was NULL. Of course, when doing that, I also had to ensure that the coalesced value didn't corrupt the result (ie, get accidentally selected as the result).
In my case, I was dealing with date/time values. And, after a little trial an error, I found out that things worked well if I used the empty string and the max MySQL date/time value as the coalesced fallbacks. At first, I tried to use 0 (zero), but that caused a really unfortunate data-type conversion in which the date/time values were converted to some numeric format:
SELECT GREATEST( UTC_TIMESTAMP(), 0 ) ==> 20150311111128.000000
Figuring out the appropriate fallback value was only the first step. In order to make the database result more correct, I then had to figure out when to return NULL in lieu of a valid date.
Here's an example of what I was doing with GREATEST(). Please note that I've left in pseudo-comments to give a better sense of how this would actually work across multiple tables and not just literal values.
NOTE: In these demos, I'm not selecting an aggreate - MAX() or MIN() - since I am dealing with a single value. However, in reality, each of the COALESCE() queries would be a SELECT MAX() or SELECT MIN().
<!---
When using GREATEST(), MySQL will return NULL if *any* of the arguments are NULL.
Therefore, if it is plausible for an argument to be NULL, we need to use COALESCE()
to ensure a value that is both non-NULL and, at the same time, will not corrupt the
results (ie, will not be selected as the "result" over other valid values). When
dealing with date / time values, using the empty string ('') will allow the real
dates to come through in the result-set without performing unfortunate type coercion.
--->
<cfquery name="greatestDate" datasource="testing">
SET @result = GREATEST(
COALESCE(
(
SELECT
UTC_TIMESTAMP()
/*
FROM
some_table
WHERE
some_condition
*/
),
''
),
COALESCE(
(
SELECT
CAST( '2014/07/04' AS DATETIME ) -- Past date.
/*
FROM
some_table
WHERE
some_condition
*/
),
''
),
COALESCE(
(
SELECT
NULL
/*
FROM
some_table
WHERE
some_condition
*/
),
''
)
);
/*
If none of the GREATEST() arguments returned a valid date / time value then
our result will be the empty string. As such, we want to catch that case and
return a NULL value to be a bit more "correct".
--
NOTE: In ColdFusion this doesn't really matter as NULL is translated into the
empty string, implicitly, when dealing with the query object.
*/
SELECT
NULLIF( @result, '' ) AS dateValue
;
</cfquery>
<!--- Check to see if we have a real date to work with. --->
<cfif isDate( greatestDate.dateValue )>
Date: <cfoutput>#dateFormat( greatestDate.dateValue, "mmmm d, yyyy" )#</cfoutput>
<cfelse>
Date: <em>None could be found.</em>
</cfif>
Notice that each argument being passed into the GREATEST() function is wrapped in a COALESCE() to make sure that it's never NULL. Then, once we have our GREATEST() result, we check to see if it's the fallback value - the empty string; and, if so, we return NULL back to the calling context (ColdFusion in this case).
Here's an example of what I was doing with LEAST():
<!---
When using LEAST(), MySQL will return NULL if *any* of the arguments are NULL.
Therefore, if it is plausible for an argument to be NULL, we need to use COALESCE()
to ensure a value that is both non-NULL and, at the same time, on that will not
corrupt the results (ie, will not be selected as the "result" over other valid
values). When dealing with date / time values, using the max MySQL date / time value
('9999-12-31 23:59:59') will allow the real dates to come through in the result-set
without performing unfortunate type coercion.
--->
<cfquery name="leastDate" datasource="testing">
SET @result = LEAST(
COALESCE(
(
SELECT
UTC_TIMESTAMP()
/*
FROM
some_table
WHERE
some_condition
*/
),
'9999-12-31 23:59:59'
),
COALESCE(
(
SELECT
CAST( '2015/07/04' AS DATETIME ) -- Future date.
/*
FROM
some_table
WHERE
some_condition
*/
),
'9999-12-31 23:59:59'
),
COALESCE(
(
SELECT
NULL
/*
FROM
some_table
WHERE
some_condition
*/
),
'9999-12-31 23:59:59'
)
);
/*
If none of the LEAST() arguments returned a valid date / time value then
our result will be the max MySQL date. As such, we want to catch that case
and return a NULL value so that our application doesn't think that the default
value is what we intended.
*/
SELECT
NULLIF( @result, '9999-12-31 23:59:59' ) AS dateValue
;
</cfquery>
<!--- Check to see if we have a real date to work with. --->
<cfif isDate( leastDate.dateValue )>
Date: <cfoutput>#dateFormat( leastDate.dateValue, "mmmm d, yyyy" )#</cfoutput>
<cfelse>
Date: <em>None could be found.</em>
</cfif>
The workflow between the two functions is the same - the only difference relates to what I need to use as the COALESCE() fallback value.
SQL is really awesome. Sometimes, there's nothing more satisfying than writing a query that is a perfect balance of power and readability. I often grapple with how much logic to push into the database and how much to keep to the application. But, if I can keep the SQL simple, yet effect, it feels like a big win.
Want to use code from this post? Check out the license.
Reader Comments
@All,
I should also clarify that when I am SELECT'ing from a table, and not a single value, I was employing MAX() and MIN() on the table itself:
COALESCE( ( SELECT MAX( date_column ) FROM some_table ), '' )
... in case that wasn't clear from the description.
Hi Ben,
Were LEAST and GREATEST fast enough, was thinking I would approach it naively something like this,
SELECT max(d), min(d) FROM(
SELECT UTC_TIMESTAMP() as d
UNION
SELECT CAST( '2015/07/04' AS DATETIME ) as d
UNION
SELECT NULL as d
)b
ORDER BY d ASC
but LEAST( and GREATEST( do seem more readable.
Isn't it just easier to write this?
case when arg1 <= arg2 then arg1 else coalesce(arg1, arg2) end -- least
case when arg1 >= arg2 then arg1 else coalesce(arg1, arg2) end -- greatest
@Adam,
That would work; but, I think you'd want to add a `LIMIT 1` after the ORDER BY so that you only return on result. That said, I wonder if the UNION forces the database engine to store a different kind of intermediary derived table? Meaning, since you are moving the MIN/MAX aggregate to the outer query, the inner query results (of the UNION) have to be stored in memory. The SQL optimizer probably handles it well; but, it was my only thought.
@Shawn,
In your approach, to be clear, you would have to get each intermediary result as a separate variable and then use additional SELECTs to find the MIN/MAX. I think your approach can work well if you only have 2 values. But, when you have 3 or more (such as in my demo), I think using comparison operators can become a bit unwieldy.
Also, if one of the MIN/MAX values is NULL, I think you'll have trouble with your comparison operator as comparing to NULL often leads to unexpected results (unless you really know how NULL comparisons work).
@Ben,
Yes, your only guessing until you try it with the real data. I'll try out using LEAST() and GREATER() next time I do something like that. Its a new approach for me any way guess the more possibilities the better.
@Ben,
When I first read it I thought you were jumping through unnecessary hoops to write two queries against the same table (but different columns) and then finally combining them at the end. (Part of my confusion was the result of reading "/* FROM some_table WHERE some_condition */" as the same in each instance.)
So anyway, I wasn't suggesting that GREATEST() and LEAST() aren't handy functions to have around. I do want to point out that the NULL values in my case expressions are handled correctly by falling through to a COALESCE(). But I did mean to write it's arguments in the reverse order, as below, which is critical to getting it to work right. ;)
case when arg1 <= arg2 then arg1 else coalesce(arg2, arg1) end -- least
case when arg1 >= arg2 then arg1 else coalesce(arg2, arg1) end -- greatest
@Shawn,
Ah, I can dig it. Yea, when I first wrote the code, I suddenly realized I was doing nothing but selecting "literals". So, I added the "/* FROM some_table WHERE some_condition */" stuff to try to indicate that, in reality, these would be more robust table-oriented queries. Probably, I should have just made some test tables to make the demo a bit less confusing.
Very helpful.
Often times when I have to compare dates I convert them to floating numbers: FLOOR(CAST(dateValue AS FLOAT))
FLOOR just gets rid of decimals as I usually don't care to know the time and just need date. I've found dealing with numbers versus dates is much easier, but can be costly if not careful... I wonder though how or even if this same principle could be applied to your example of selecting the newest and oldest dates? Because once found the floating number can be easily cast back to date/time.
I should clarify, working with dates versus comparing.
@Gabriel,
I'm a huge fan of working with dates as numbers. And, while it uses a different "zero date", ColdFusion also allows for the same thing (though sometimes you have to be careful with how it casts). I call it "date math." It makes me happy.
You have presented the relevant code to show the error which is triggered due to invalid data. It provides a great understanding about the chances of error during the design. Thanks for sharing the data.
I have to admit that you have done a sensational job here today. I hope you keep this going on.Thank you