Converting UUIDs To Binary For VARBINARY(16) Storage In MySQL And ColdFusion
The other day, while recording a Working Code podcast episode, I mentioned to Adam that a big blind-spot in my database mental model was storing non-AUTO_INCREMENT
primary keys. Or, more specifically, using something like a UUID (Universally Unique Identifier), GUID, or CUID as a table's primary key. As such, I wanted to start building up some foundational knowledge. And, based on what I've been reading, it seems that being able to convert a UUID string to and from a binary value is an important point of know-how. This post looks at performing this String-to-Binary conversion in ColdFusion.
To be clear, I am not a database expert! Yes, I love writing SQL. And yes, I love thinking deeply about database index design. But, I'm not one of those people who knows much about low-level storage details, engine ramifications, data replication, or any of the many complex topics that go into database management. Consider this post a note-to-self more than anything.
To start learning about storing Strings as primary keys, I did some reading:
- MySQL Blog: Storing UUID Values in MySQL Tables
- Rick James: GUID/UUID Performance Breakthrough
- Percona Blog: Storing UUID Values in MySQL
From what I've seen in these articles - which is echoed in many StackOverflow posts - is that using Strings as primary keys is a trade-off: in return for having system-independent uniqueness, you incur larger indexes, larger working memory, possible performance hits, less intuitive values (pro-or-con depending on how you see it), and more complex workflows.
This post doesn't tackle all of those issues - I'm here to noodle on just one of them: larger indexes. Part of the index-size issue comes from how the value is stored. If a UUID is a 35-character String, storing said UUID as a String requires 35-bytes (1 byte per character).
And, that's just for the column value itself. When you consider that the primary key is implicitly stored as the suffix on a secondary index, the storage requirements of a "UUID as String" is multiplied by the number of indexes on the table. Not to mention that any other table using said UUID as a foreign key will also need 35-bytes.
A common suggestion for reducing storage size is to persist the value as a VARBINARY(16)
instead of a VARCHAR(35)
. This technique is based on the fact that a UUID is already a HEX-encoded value. As such, converting a UUID into a Byte Array requires little more than a binaryDecode()
call.
Converting a binary value back into a UUID is a little more work since we have to re-insert the dashes (-
) after we generate the String. Here's two User Defined Functions (UDFs) that I created for managing this conversion in ColdFusion:
<cfscript>
/**
* I convert the given UUID string to a byte array (binary value) for use in a MySQL
* VARBINARY(16) database field.
*/
public binary function uuidToBinary( required string input ) {
// The UUID string is already a hex-encoded representation of data. As such, to
// convert it to binary, all we have to do is strip-out the dashes and decode it.
return(
binaryDecode(
input.replace( "-", "", "all" ),
"hex"
)
);
}
/**
* I convert the given MySQL VARBINARY(16) byte array (binary value) to a ColdFusion
* UUID string.
*/
public string function binaryToUuid( required binary input ) {
var asHex = binaryEncode( input, "hex" );
// ColdFusion UUIDs use the format: xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxxxxxx.
return(
asHex.left( 8 ) & "-" &
asHex.mid( 9, 4 ) & "-" &
asHex.mid( 13, 4 ) & "-" &
asHex.right( 16 )
);
}
</cfscript>
To try these functions out, I created a simple MySQL database table that uses a VARBINARY
primary-key and a value
column that stores the UUID in plain-text so that we can confirm values:
CREATE TABLE `uuid_test` (
`uid` varbinary(16) NOT NULL,
`value` varchar(255) NOT NULL,
PRIMARY KEY (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Then, I generated 10,000 rows in this test table. Note that in my INSERT
, I'm using a CFQueryParam
of type binary
for the primary-key column.
<cfscript>
// Include are `uuidToBinary()` and `binaryToUuid()` UDFs.
include "./functions.cfm";
loop times = 10000 {
uid = createUuid();
```
<cfquery>
INSERT INTO
uuid_test
SET
uid = <cfqueryparam value="#uuidToBinary( uid )#" sqltype="binary" />,
value = <cfqueryparam value="#uid#" sqltype="varchar" />
;
</cfquery>
```
}
</cfscript>
To then test the SELECT
ing of rows, I looked in the database table, grabbed a UUID from about half-way through the table, and used it to locate the row. Notice that I'm using the uuidToBinary()
to perform the look-up; and then, I'm using the binaryToUuid()
to consume the key in my ColdFusion code:
<cfscript>
// Include are `uuidToBinary()` and `binaryToUuid()` UDFs.
include "./functions.cfm";
// A UUID randoly picked from half-way through the records.
uid = "6D9F382A-5164-48EF-8DDEA942D5EAE8E3";
```
<cfquery name="results">
SELECT
t.uid,
t.value
FROM
uuid_test t
WHERE
t.uid = <cfqueryparam value="#uuidToBinary( uid )#" sqltype="binary" />
;
</cfquery>
```
dump( results.uid );
dump( results.value );
// Use our custom functions to convert the VARBINARY back to a String for consumption
// within the ColdFusion application.
dump( binaryToUuid( results.uid ) );
</cfscript>
When we run this ColdFusion (Lucee CFML) code, we get the following output:
As you can see, we were able to locate the row in the database using the VARBINARY
value that we generated with uuidToBinary()
. Then, we were able to convert the binary value back into a ColdFusion-formatted UUID using the binaryToUuid()
function.
And, if we run an EXPLAIN
on this query, we can see that it uses the implicit primary-key index to look up a single-row without any table scanning!
NOTE: I'm using MySQL's
UNHEX()
method in this case since I'm running thisEXPLAIN
right in the database interface:
As you can see, this SQL query is using the implicit primary key (PKEY) index. And, is able to locate the single row using the index without any table scanning.
MySQL 5.7 vs MySQL 8
While MySQL 5.7 ships with a function to generate UUIDS (time-based, version 1), it leaves the string-to-binary conversions up to the application server. As of MySQL 8, however, the database engine now includes additional functions to perform these conversions in the SQL context:
UUID_TO_BIN()
BIN_TO_UUID()
For the time-being, I'll be sticking with Integer-based AUTO_INCREMENT
columns for my primary keys. But, at least I feel like I'm finally starting to build up my mental model for what a String-based primary-key might look like. The UUIDs that ColdFusion generates (random, version 4) incur additional problems with regard to index structure and storage; but, that's a whole other topic (of which I know very little).
Want to use code from this post? Check out the license.
Reader Comments
You'll certainly want to run speed tests on using long strings (versus the smallest possible integer).
@Figital,
Yeah, this is my biggest fear. Having an
int
as the primary key feels like such a known, battle-tested approach. With Strings, I feel like I'm stepping back into the complete unknown. Plus, withint
, the primary-key can essentially step-in as anORDER BY
column as well. But, with a String, that might not be in any particular order, you can't take those kind of short-cuts ... unless the String leads with a date/time-stamp ... but again, just a lot of stuff I don't have a handle on yet.I wouldn't lose sleep over this!
It depends on what the key will be used for ... if it's only for indexing / foreign keys (and most of them are) then stick to smallest ints. If you need something more human readable or universally unique (you'd never need both at the same time) ... then those are each different types of data (and not the same thing).
You can also have all of them in the same record ... I've had tables with auto-increments, uuids, and human readable tokens all in the same row (I don't feel like remembering why right now though oops) ... perhaps only because looking up a top level record by a string is okay ... but doing deep SQL joins ... BAD! (and you usually need to do both)
Actually most of the times I've needed to create reasonably unique, reasonably human readable identifiers is because the data was going to reside OUTSIDE of a relational database .... like a filename in a filesystem or a serial number stuck to a widget.
@Figital,
Now that you say that, I definitely have a few tables where we have
int
for the primary key and avarchar(50)
for a "non-guessable" key that can be referenced externally (and rotated as well should we need to, in theory). Thatvarchar
has a unique-index on it, and I never worried about it before.Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →