Where Does Serialization / Deserialization Belong In A Database Access Workflow?
A ColdFusion web application is composed of a series of nested abstractions. Each abstraction layer hides some level of private detail and exposes data for public consumption. For most of the work that I do, the exposed data is one dimensional. But, on occasion, I need to store complex object structures. As a simple example, I might have a MySQL table with a JSON column. Which means that each record that I read from said database table contains both normal data and serialized data. Which begs the question: where in the data access workflow should the embedded serialized data (JSON) be deserialized?
For the sake of discussion, let's imagine that I'm working on a job queue. And, I want the job queue to be flexible; so, instead of encoding various job details into a variety of strict columns, I have one loose JSON column that contains all job-relevant information:
CREATE TABLE `job` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`details` json NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Generally, when I'm building my layers of abstraction around a given "entity", I have at least two ColdFusion components:
One for low-level data persistence. Normally, I call this a "Gateway" component. As in,
JobGateway.cfc
.One for high-level entity management. Normally, I call this a "Service" component. As in,
JobService.cfc
.
The goal of the gateway component is to provide an interface that hides the persistence mechanism. Meaning, The JobService.cfc
can depend upon the JobGateway.cfc
methods without knowing if there's a SQL database, or a file system, or an ORM (Object-Relational Mapping) tool, or an in-memory cache being used behind the scenes.
Aside: In reality, there's always some degree of "implementation leakage", especially when you need to use something like a database transaction in order to group two database operations in an atomic change. In theory, you can try to abstract the notion of a database transaction with something like a "Unit of Work"; but, this ends up creating a lot of unfortunate indirection with makes the code harder to read and maintain (in my opinion).
So the question becomes, when passing a job to the JobGateway.cfc
, is the passed-in details
property a struct? Or, is it a serialized JSON string? Conversely, when reading a job from the JobGateway.cfc
, is the returned details
property a struct? Or, is it a serialized JSON string?
For the longest time, I'd have answered this question by passing around JSON strings; and, administering the serialization and deserialization work to the service layer. This decision was driven entirely by the fact that my gateway layer was nothing but a thin wrapper around SQL queries. And, that all of my gateway methods returned query objects.
ColdFusion query objects can contain other complex values (they can even contain nested query objects). But, attempting to put structured data inside a query column always felt unnatural to me.
But then, modern ColdFusion releases started allowing the CFQuery
tag to return query data in alternate formats: an array of structs (returnType="array"
) and a key-based struct index (returnType="struct"
). This suddenly meant that my gateway components could start returning non-query results without any extra effort. Which gave me a chance to step-back and re-evaluate my historical data workflow choices.
By putting the serialization and deserialization work into my service layer, I was fighting against an issue known as, "impedance mismatch". This is the notion that the data structures I need to use in process are different from the data structures I need to persist.
I've always been aware of this problem. But, I've always erred on the side of keeping my gateway components simple. In my mind, this was the lesser of two evils.
But, with the evolving functionality of the CFQuery
tag, I feel like I can now move this logic into the gateway without having to compromise on complexity. To illustrate, here's what my (truncated) JobGateway.cfc
might look like:
component {
/**
* I create a job with the given details.
*/
public numeric function createJob( required struct details ) {
var results = queryExecute(
"
INSERT INTO
job
SET
details = :details
",
{
details: {
value: serializeJson( details ), // !! SERIALIZE DATA !!
cfsqltype: "cf_sql_varchar"
}
},
{
result: "local.metaResults"
}
);
return( metaResults.generatedKey );
}
/**
* I get job with the given ID.
*/
public array function getJobByID( required numeric id ) {
var results = queryExecute(
"
SELECT
j.id,
j.details
FROM
job j
WHERE
j.id = :id
",
{
id: {
value: id,
cfsqltype: "cf_sql_bigint"
}
},
{
result: "local.metaResults",
returnType: "array"
}
);
// !! DESERIALIZE DATA !!
for ( var result in results ) {
result.details = deserializeJson( result.details );
}
return( results );
}
}
Notice that the details
property is passing through the JobGateway.cfc
membrane as a struct
. And, that I'm calling serializeJson()
internally when persisting the data; and, that I'm calling deserializeJson()
internally after I've read data back from the database. Now, the calling context - my JobService.cfc
component - doesn't have to know about the serialization at all.
This feels like the right choice; at least for JSON. But, it now calls into question other more subtle serialization scenarios. Such as using TINYINT
columns for Boolean flags.
Again, historically for TINYINT
columns, I've just returned the query object from the gateway layer and deferred Boolean casting to the service layer. But, if I'm going to be handling the JSON transformations inside the gateway components, it follows that I should also handle TINYINT
transformations inside the gateway components.
To consider this case, image that my job
table now has an isProcessed
flag. This flag is meant to be a Boolean but is persisted as a TINYINT
. If I wanted to handle the transformation internally to the gateway, here's what the updated code might look like:
component {
/**
* I create a job with the given details.
*/
public numeric function createJob(
required struct details,
required boolean isProcessed
) {
var results = queryExecute(
"
INSERT INTO
job
SET
details = :details,
isProcessed = :isProcessed
",
{
details: {
value: serializeJson( details ), // !! SERIALIZE DATA !!
cfsqltype: "cf_sql_varchar"
},
isProcessed: {
value: isProcessed, // ColdFusion will auto-cast for us.
cfsqltype: "cf_sql_tinyint"
}
},
{
result: "local.metaResults"
}
);
return( metaResults.generatedKey );
}
/**
* I get job with the given ID.
*/
public array function getJobByID( required numeric id ) {
var results = queryExecute(
"
SELECT
j.id,
j.details,
j.isProcessed
FROM
job j
WHERE
j.id = :id
",
{
id: {
value: id,
cfsqltype: "cf_sql_bigint"
}
},
{
result: "local.metaResults",
returnType: "array"
}
);
// !! DESERIALIZE DATA !!
for ( var result in results ) {
result.details = deserializeJson( result.details );
result.isProcessed = !! result.isProcessed;
}
return( results );
}
}
As you can see, we're now handling transformations of both the details
property and the isProcessed
property such that the gateway is always accepting and returning ColdFusion-native structs and Booleans. Again, this works to hide the underlying persistence mechanism (and the impedance mismatch) from the calling context.
Balancing Pedantry And Pragmatism
One of the big selling points of ColdFusion is its simplicity. And the fact that it really "just works" in so many different scenarios. But, the ColdFusion of old isn't the same language that it is today. Modern ColdFusion has evolved into a mature, sophisticated, powerful runtime; and, it now merits more sophisticated thinking.
Take the TINYINT
example from above. In early versions of ColdFusion, passing around a 1
instead of a native Boolean was fine because:
ColdFusion didn't really have native Booleans anyway.
ColdFusion's built-in functions happily auto-cast between many different data representations (still true today).
ColdFusion generally owned the end-to-end rendering of a webpage request.
JSON as the de facto serialization standard wasn't even invented yet.
These early constraints of the young web were great because it meant that abstractions really didn't matter all that much. But this isn't the world we live in today. Today, abstractions do matter. And, data types do matter.
Consider trying to call the yesNoFormat()
built-in function as a member method:
writeOutput( value.yesNoFormat() )
If you call that on a true
value, you get the output Yes
. However, if you attempt to call that member method on a TINYINT
value passed-back from the database, you get the following ColdFusion error:
The function [yesNoFormat] does not exist in the Numeric.
Taking a moment to stop an think about data serialization may seem pedantic. But, in the modern world, data types matter. How we choose to pass around data matters.
Where Should "Encryption At Rest" Take Place?
As I was thinking about this topic, I thought it might be worth considering encrypting data at rest. I don't mean encryption at the file-system level (a security feature that the database itself might offer), I mean encryption at the column level. Such as encrypting a stored password or an API key.
Encryption, while it is an example of data transformation, is not the same as serialization. As such, I wouldn't put encryption and decryption in the data access layer (Gateway). I probably wouldn't even put it in the Service layer.
To me, encrypting data for storage is a requirement of the business, not of the entity. As such, it makes sense to handle encryption and decryption at the "workflow" layer - ie, the highest layer of the "application core" that coordinates commands and queries against the system.
So, for example, both a Gateway component and a Service component would need to know that the password
property is persisted as a string; but, only the higher Workflow layer would know that this string is actually a bCrypt-encoded one-way hash with a unique salt.
The Lines Will Always Be Blurry
There is the academia of best practices; and then, there is the reality of building complex applications under deadline. These two concerns are always in some degree of conflict. For example, in a "perfect world", we could completely hide our data persistence mechanism. But, in the "real world", knowing that we have a relational database under the hood allows us to build idempotent workflows with much less complexity.
To paraphrase Morpheus, the trick to building applications isn't about strict adherence to best practices at all cost, it's about knowing which rules can be bent and which can be broken.
Want to use code from this post? Check out the license.
Reader Comments
I do the same, though I just add a
.map()
to the end of thequeryExecute
rather than adding in a separate loop.@Andrew,
That's totally legit. The only thing that I get frustrated about the
Query.map()
is that it generates another query. I almost with there was a Query method for.mapToArray()
. But, that's just a personal preference, not a value judgement.Fair enough, using your syntax, I'd just do this:
@Andrew,
Looks solid to me 💪
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →