Performing A Case-Sensitive Search On A Case-Insensitive Column Using COLLATE In MySQL 5.6.49
Applications evolve over time. And, when you've been working on an application for close to a decade, you can bet dollars-to-donuts that the database schema isn't optimal. Usually, old MySQL databases aren't using utf8mb4
character-sets; and, the column collations might not be appropriate for their use-case. Recently, I've been thinking about changing the way one column in particular is used, moving from a HEX-encoded value to a Base64-encoded value. The issue there being that HEX is not case-sensitive whereas Base64 is case-sensitive. And, I wanted to see if I could use the COLLATE
clause to apply a case-sensitive search to a case-insensitive column in MySQL 5.6.49.
To set the context here, imagine that I have a secret_message
table in which secret messages can be looked up using a unique slug
value. Given the age of this table, it wasn't defined with any deep thought and might have a schema like this:
CREATE TABLE `secret_message` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`slug` varchar(50) NOT NULL,
`message` varchar(300) NOT NULL,
PRIMARY KEY (`id`),
KEY `ix_bySlug` (`slug`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
While not shown in this schema, the columns are using a utf8_general_ci
collation (I'm assuming it's not showing here because that's the default collation in the database?).
When the slug
column is populated with HEX values, the case-sensitivity doesn't really matter because, with HEX, aaa
and AAA
are the same value. However, when it comes to Base64, case means something. Which means that, with Base64, aaa
and AAA
are distinct values.
So, when I start storing Base64 values in this table, I need my WHERE
clause to be able to distinguish between aaa
and AAA
. With MySQL, we can do this by using the COLLATE
clause and the utf8_bin
collation (in my particular case):
WHERE slug COLLATE utf8_bin = 'aaa'
But, we have to be careful about our database indexes. Just as when using a Function on a column, attempting to apply a COLLATE
will bypass an index. To see what I mean, let's populate the secret_message
table with some similar looking slugs:
INSERT INTO secret_message
( slug, message )
VALUES
( 'aaaa', 'You are a rock star!' ),
( 'AAAA', 'You are a doofus!' ),
( 'aAaA', 'You are a magical unicorn!' )
;
Now, let's attempt to query for one of the slugs using just the COLLATE
clause:
EXPLAIN SELECT
m.id,
m.slug,
m.message
FROM
secret_message m
WHERE
m.slug COLLATE utf8_bin = 'aaaa'
;
When we run this EXPLAIN
statement, we get the following result:
As you can see, despite the fact that the slug
column is indexed, attempting to search the column using COLLATE
leaves us performing a full-table scan. To fix this, we need to perform both a case-insensitive search and a case-sensitive search:
EXPLAIN SELECT
m.id,
m.slug,
m.message
FROM
secret_message m
WHERE
-- NOTE: We have to perform the CASE INSENSITIVE search first so that we
-- can leverage the INDEX on the table column, `slug`. If we try to go
-- directly to the COLLATE directive, the query won't use the index and
-- we do a full-table scan.
m.slug = 'aaaa'
AND
m.slug COLLATE utf8_bin = 'aaaa'
;
And, when we run this EXPLAIN
statement, we get the following result:
As you can see, now that we are using both comparisons, the database is only searching three rows for the three similar looking slugs:
aaaa
AAAA
aAaA
This is because the first WHERE
condition performs a case-insensitive search on the slug
column and then the second WHERE
condition further filters the results based on a case-sensitive collation.
And, to demonstrate that this is actually locating the one correct row, here's the same query without the EXPLAIN
directive:
As you can see, we located the aaaa
row and ignored the similar AAAA
and aAaA
rows.
Obviously, in a perfect world, the schema of the table would just be updated to use a different collation so that we don't have to do anything special in our SQL statement. But, welcome to the real world where data is messy and deadlines are tight and you don't always get to change things to suit your needs. In such cases, it's nice to see that we can overlay a case-sensitive search on top of a case-insensitive column definition in MySQL.
Want to use code from this post? Check out the license.
Reader Comments