Serializing A MySQL RecordSet As A Set Of INSERT Statements Using Lucee CFML 5.2.9.40
This past week at InVision, some data was accidentally deleted from one of our MySQL databases. In order to get the data back, our Data Services team had to restore one of the backups. And then, I had to write a ColdFusion script that would transfer a slice of the data from the restored-database back into the production database. Of course, for security reasons, I was only given access to the backup. As such, I had to locate the deleted records, serialize them as a series of INSERT
statements, and then hand those SQL queries off to the Data Service team for execution. This was actually a really fun task to work on; and so, I thought it might be interesting to share my basic approach.
The deleted-data was spread across multiple tables. As such, my data restoration algorithm had to span multiple tables and multiple queries. What I ended up using was a Struct in which each key was a tablename and each value was an Array of database records. It was little more than a glorified nested loop:
- For each table:
- For each record:
- Serialize record as
INSERT
- Serialize record as
- For each record:
The most interesting part of this algorithm was the serialization of the record into the INSERT
statement. After some trial and error, I found that I had to massage some data-type, leave some data-types as is, and Hex-encode other types in an attempt to mitigate any lurking SQL-injection attacks in the data.
Here's a simplified version of the script that looks at a single table, product
. Please note that I am working with a MySQL database, so I am using some syntax that MySQL-specific. I authored this script to be readable in a top-down manner:
<cfscript>
// The results collections is a map of Table Names to Records.
results = {
"product": queryExecute(
sql = "
SELECT
id,
categoryID,
name,
price,
sku,
isOnSale,
createdAt,
deletedAt
FROM
product
;
",
options = {
datasource = "testing",
returnType = "array"
}
)
};
// Map the results collection onto a multi-statement set of INSERT operations.
sqlPayload = generateSql( results );
echo( "<code><pre>#encodeForHtml( sqlPayload )#</pre></code>" );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I generate a series of SQL INSERT statements for the given results. The results
* object is expected to be in form of { [ key: tablename ]: record[] }.
*
* @results I am the collection of table-records being mapped to INSERT statements.
*/
public string function generateSql( required struct results ) {
var statements = results.keyArray().map(
( tablename ) => {
return( generateSqlForTable( tablename, results[ tablename ] ) );
}
);
return( statements.toList( chr( 10 ) ) );
}
/**
* I generate a series of SQL INSERT statements for the given records.
*
* @tablename I am the table into which records are being inserted.
* @records I am the set of records.
*/
public string function generateSqlForTable(
required string tablename,
required array records
) {
var statements = records.map(
( record ) => {
return( generateInsert( tablename, record ) );
}
);
return( statements.toList( chr( 10 ) ) );
}
/**
* I generate a SQL INSERT statement for the given record.
*
* CAUTION: Uses an INSERT SET style syntax that is supported by MySQL.
*
* @tablename I am the table into which the record is being inserted.
* @record I am the record being inserted.
*/
public string function generateInsert(
required string tablename,
required struct record
) {
var tab = chr( 9 );
var columnnames = record.keyArray();
var columnCount = columnnames.len();
var statement = [];
statement.append( "INSERT INTO" );
statement.append( "#tab#`#tablename#`" );
statement.append( "SET" );
statement.append(
columnnames.map(
( columnname, i ) => {
var assignment = record.keyExists( columnname )
? "`#columnname#` = #escapeValue( record[ columnname ] )#"
: "`#columnname#` = NULL"
;
var suffix = ( i < columnCount )
? ","
: ""
;
return( tab & assignment & suffix );
}
),
true // Merge all values.
);
statement.append( ";" );
return( statement.toList( chr( 10 ) ) );
}
/**
* I escape and serialize the given value for use in a SQL INSERT assignment.
*
* CAUTION: Uses the UNHEX() function that is supported by MySQL.
*
* @value I am the value being escaped.
*/
public string function escapeValue( required any value ) {
// For dates, we have to serialize the value using a string format that MySQL
// will properly interpret as a Date/Time value.
// --
// CAUTION: I am explicitly using the TIMEZONE since all dates are stored in the
// database as UTC; however, the server I am running this demo on currently is
// set to use EST. As such, the dates will default to serializing in the EST
// timezone unless I explicitly define the target timezone. Really, the server I
// am on should ALSO be running in UTC. Not sure why it's using EST.
if ( isInstanceOf( value, "java.util.Date" ) ) {
return( "'" & value.dateTimeFormat( "yyyy-mm-dd HH:nn:ss.l", "UTC" ) & "'" );
}
// For numbers, we should be able to serialize them as-is.
if ( isNumeric( value ) ) {
return( value );
}
// For non-numeric values, we're going to HEX-encode the value, and then UNEX
// it within the SQL statement. This side-steps having to worry about escaping
// values that may lead to SQL-INJECTION ATTACKS.
var binaryValue = charsetDecode( value, "utf-8" );
var hexValue = binaryEncode( binaryValue, "hex" );
return( "`UNHEX`( '#hexValue#' )" );
}
</cfscript>
This algorithm is not designed to account for every single data-type that MySQL offers - it accounts only for the data-types that we use. And, when I run the above Lucee CFML code on a demo table, I get the following output:
As you can see, each record in the set of database tables was serialized as a single MySQL INSERT INTO ... SET
statement (special thanks to Bryan Stanley who taught me this syntax).
In a ColdFusion query, I would normally use a <cfqueryparam>
tag to protect myself against SQL Injection attacks. However, since these queries are going to be executed outside of Lucee, I had to explicitly escape Strings. To keep things simple, I just Hex-encoded the value in Lucee and then used the MySQL UNHEX()
function to translate it back to string format. By doing this, I prevent any embedded malicious (or special characters) from messing up the SQL statements.
I then took these MySQL statements, zipped them up, and passed them over to the Data Service team for execution. Of course, before doing that, I was able to run the INSERT
statements against a local version of the database schema to make sure everything populated as expected.
I am sure that there are professional tools out there that would do some of this work for me. But, I love to write SQL; and, I don't know any of those tools; so, Lucee CFML was the right tool for the job at the time.
Want to use code from this post? Check out the license.
Reader Comments
Hi Ben, is there a reason you're using MySQL instead of MariaDB?
@Señor,
I use MySQL 1) out of habit, and 2) it's what this app has used for the past 7-years. What's the high-level overview of MariaDB? I believe the last I heard, it's just a drop-in replacement for MySQL. Is it just supposed to be faster, but all the same API?
@All,
So, over the past two days, I've been working on some data remediation for a client (who accidentally bulk-removed users from our platform). And, as part of that remediation, I've had to generate
INSERT
statements fromSELECT
statements directly within MySQL:www.bennadel.com/blog/4116-generating-sql-insert-statements-from-sql-select-statements-in-mysql-5-7-32.htm
Syntax-wise, it's harder to do in SQL vs. ColdFusion; but, it got the job done.