An Experiment In Using An Embedded Apache Derby Database To Power ColdFusion Query-Of-Queries
Before we get into this, I just want to say that this was nothing more than a fun experiment. I don't honestly think that this approach holds any kind of practical value; mostly, I just wanted to see if I could make it happen (at least in a proof-of-concept kind of way).
It's no secret that I love ColdFusion query-of-queries. But, admittedly, they do have some serious limitations. Speed aside, they only allow SELECT statements to be executed. In the past, I've used some fun string manipulation to convert UPDATE and DELETE statements into SELECT statements. But ultimately, I was still using the same underlying query-of-queries technology. To experiment with a slightly different approach, I wondered if I could use an embedded Apache Derby database to build and populate temporary tables on top of which the "query of queries" SQL could be executed.
To see what I'm talking about, take a look at the following code. We start out with one query that was pulled from an actual MySQL database; then, we run our derby-powered query-of-queries on it, executing INSERT, UPDATE, DELETE, and SELECT statements:
<cfoutput>
<!--- Query for all the friends. --->
<cfquery name="friends" datasource="testing">
SELECT
f.id,
f.name
FROM
friend f
ORDER BY
f.name ASC
</cfquery>
ORIGINAL NAMES:<br />
<!--- Output the original names. --->
<cfloop query="friends">
#friends.name#<br />
</cfloop>
<!--- ------------------------------------------------- --->
<!--- ------------------------------------------------- --->
<br />
<!--- ------------------------------------------------- --->
<!--- ------------------------------------------------- --->
<!---
Now, perform a "query of queries" on the above object.
This will actually run in an embedded Apache Derby database.
--->
<cf_qoq name="friends">
INSERT INTO {friends}
(
id,
name
) VALUES (
6,
'Jennifer'
);
INSERT INTO {friends}
(
id,
name
) VALUES (
7,
'Kathleen'
);
UPDATE
{friends}
SET
name = 'Sarah'
WHERE
name = 'Sara'
;
DELETE FROM
{friends}
WHERE
name LIKE '%a'
;
SELECT
f.id,
f.name
FROM
{friends} f
ORDER BY
f.name ASC
;
</cf_qoq>
NEW NAMES:<br />
<!--- Output the updated names. --->
<cfloop query="friends">
#friends.name#<br />
</cfloop>
</cfoutput>
In the augmented query-of-queries SQL, our table variables have to be wrapped in {table} notation. This is because the underlying ColdFusion custom tag actually goes through the raw SQL and replaces those instances with the names of temporary, populated Apache Derby database tables.
When we run the above code, we get the following output:
ORIGINAL NAMES:
Joanna
Kim
Nicole
Sara
Tricia
NEW NAMES:
Jennifer
Kathleen
Kim
Nicole
Sarah
As you can see, all four statements - INSERT, UPDATE, DELETE, and SELECT - executed properly, leaving us with the resultant ColdFusion query object.
Let's take a look at the QoQ.cfm ColdFusion custom tag. Since this was just a proof of concept, I won't really go into any detailed explanation:
qoq.cfm ColdFusion Custom Tag
<!--- Check to see which mode the tag is executing. --->
<cfif (thisTag.executionMode eq "start")>
<!---
Make sure we have a name to work with - these query of
queries are NON-destructive. As such, they have to be moved
into another query variable.
--->
<cfparam
name="attributes.name"
type="variablename"
/>
<!--- Define the DSN to be used for this scratch database. --->
<cfset dsn = "queryofqueries" />
<cfelse>
<!--- Get the SQL from the generated content. --->
<cfset rawSql = thisTag.generatedContent />
<!---
Gather the table names that were referenced in the SQL code
- we will have to convert these to temp tables.
--->
<cfset tableNames = reMatch(
"\{[^}]+\}",
rawSql
) />
<!--- Make sure that at least one table was referenced. --->
<cfif !arrayLen( tableNames )>
<cfthrow
type="InvalidSQL"
message="Your SQL does not reference any tables."
detail="SQL: [#rawSql#]."
/>
</cfif>
<!---
Now, loop over the table names to clean them up and to make
sure they actually exist in the caller scope.
--->
<cfloop
index="index"
from="1"
to="#arrayLen( tableNames )#"
step="1">
<!--- Clean up the table name. --->
<cfset name = reReplace(
tableNames[ index ],
"[{}]+",
"",
"all"
) />
<!--- Make sure the query variable actually exists. --->
<cfif !structKeyExists( caller, name )>
<cfthrow
type="InvalidTableReference"
message="The table [#name#] could not be found."
detail="The query variable with the lable [#name#] could not be accessed."
/>
</cfif>
<!--- Add the clean table name back into the array. --->
<cfset tableNames[ index] = name />
</cfloop>
<!---
Now that we've validated the tables, we have to loop over
each one to create it in our embedded derby database. Let's
create a mapping of real names to temp tables.
--->
<cfset tempTables = {} />
<!--- Wrap all the table generation in a transaction. --->
<cftransaction>
<!---
Loop over the tables to create and populate the
temp counterparts.
--->
<cfloop
index="name"
array="#tableNames#">
<!--- Generate a random name for the temp table. --->
<cfset tempName = replace(
("TABLE-" & createUUID()),
"-",
"",
"all"
) />
<!--- Store the temp table in our map. --->
<cfset tempTables[ name ] = tempName />
<!--- Get a reference to the query object itself. --->
<cfset queryObject = caller[ name ] />
<!---
Get the meta data for the query; this will hopefully
give us the columns and data types we need to construct
a true SQL table.
--->
<cfset metaData = getMetaData( queryObject ) />
<!--- Create our temp table. --->
<cfquery name="createTable" datasource="#dsn#">
CREATE TABLE #tempName# (
<!---
Loop over the meta data to design the columns
of the temp table we need to create.
--->
<cfloop
index="columnIndex"
from="1"
to="#arrayLen( metaData )#"
step="1">
<!--- Check for the need for a comma. --->
<cfif (columnIndex gt 1)>
,
</cfif>
<!--- Define the column name. --->
#metaData[ columnIndex ].name#
<!--- Define the column type. --->
<cfif reFindNoCase( "int|double|float", metaData[ columnIndex ].typeName )>
DOUBLE
<cfelse>
<!---
By default, the column will be a
varchar. I wanted to use a LONG
varchar, but those don't allow any
comparisons to be executed.
--->
VARCHAR( 3000 )
</cfif>
</cfloop>
)
</cfquery>
<!---
Now that we have created the table, we have to
populate it with the original query data. We have to
execute these
--->
<cfloop query="queryObject">
<!--- Add the row of data. --->
<cfquery name="populateTable" datasource="#dsn#">
INSERT INTO #tempName# VALUES (
<!---
Loop over the column names to select the
column values. It is important that these
columns be in the same order as the table
creation since we are not defining the
name within the INSERT statement.
--->
<cfloop
index="columnIndex"
from="1"
to="#arrayLen( metaData )#"
step="1">
<!--- Check for the need for a comma. --->
<cfif (columnIndex gt 1)>
,
</cfif>
<!--- Insert the value. --->
<cfif reFindNoCase( "int|double|float", metaData[ columnIndex ].typeName )>
<!--- Numeric. --->
#queryObject[ metaData[ columnIndex ].name ][ queryObject.currentRow ]#
<cfelse>
<!--- String. --->
'#queryObject[ metaData[ columnIndex ].name ][ queryObject.currentRow ]#'
</cfif>
</cfloop>
)
</cfquery>
</cfloop>
</cfloop>
<!---
At this point, we've created all of our temp tables and
populated them with the query object data. Now, we have
to take the SQL from this tag and execute it. Derby does
not allow multi-queries; as such, we have to split the
raw sql on the ; characrter (proof of concept).
--->
<!---
Before we run the sql, let's replace the tables names
with the temp table names.
--->
<cfloop
index="name"
array="#tableNames#">
<cfset rawSql = replace(
rawSql,
"{#name#}",
tempTables[ name ],
"all"
) />
</cfloop>
<!---
Now, let's loop over the portions of the SQL to execute
them against the temp database.
--->
<cfloop
index="sqlPortion"
list="#rawSql#"
delimiters=";">
<!--- Make sure the SQL portion has content. --->
<cfif len( trim( sqlPortion ) )>
<!--- Execute the sql. --->
<cfquery name="lastResult" datasource="#dsn#">
#preserveSingleQuotes( sqlPortion )#
</cfquery>
</cfif>
</cfloop>
<!--- Clear the generated content. --->
<cfset thisTag.generatedContent = "" />
<!---
Check to see if we have a value to store into our
result variable.
--->
<cfif structKeyExists( variables, "lastResult" )>
<!---
Store the last result. When doing this, use the
duplicate() method - this will fix a bug with CFDump
and identity columns on our temp queries.
--->
<cfset caller[ attributes.name ] = duplicate( lastResult ) />
</cfif>
<!---
Roll back all of the database updates. This will keep our
scratch database clean (including table creation).
--->
<cftransaction action="rollback" />
</cftransaction>
</cfif>
As you can see, this ColdFusion custom tag grabs references to the original query objects and uses them to create and populate temporary tables in the embedded Apache Derby database. All of the SQL is executed in the context of a CFTransaction tag so as to leave the scratch database pristine after each use.
Needless to say, this code doesn't execute instantly. It has to perform a lot of setup work just be able to execute the actual query-of-query SQL code. But, like I said, this was just a fun experiment for the sake of exploration. I do love ColdFusion query-of-queries; and, I'd definitely love to see their functionality expanded in future releases of the language.
Want to use code from this post? Check out the license.
Reader Comments
I'm still holding to my beloved SQL WITH clause ;-)
Is there anything you could build off of this? Like, you said that it was an exercise in, "Can I do this?" That's all cool, but I prefer to do exercises in the hopes that I can tie another idea together with it that does something even cooler than before. Thus, did you have any epiphanies following this exercise?
Great for the "toolbox of my brain," but I'm not a fan of unitaskers (nod to Alton Brown). So, help me find a problem to solve with this solution, OB-1 Nadobi
@Randall, how about using it as a datasource for a game? Think of Farmville as a rough example: you don't want to hit the main database every time the player does something, so to speed things up, when the player signs in, you go to the DB server, grab the relevant game data, and copy it to "local" temporary tables (on the web server). You let the player do a certain amount of work and then sync it to the main DB.
You're trading off the possibility of losing a certain amount of work for the advantage of decreasing load on the database. In some cases, that's probably a good idea. (In others, it's probably not feasible: note that the Flash games that use something like this - using local data rather than remote data, then syncing - are not multiplayer games. You can do some types of multiplayer gaming this way, but it can become very complicated quickly.)
@Randall,
To be honest, the idea popped into my head last week and I really just wanted to try it out, take it to something tangible. That said, I do use query of queries, and have, at times, wished I could use the UPDATE statement on them. I do try to put as much of the work into the database itself, when I can; but, sometimes, I can't find (efficient) ways to calculate values on the DB side.
One particular use-case that I had a while back was that I had a table of users and a 3rd party API that managed subscription information. I wanted to create a query on the server that merged the two. So, I pulled down the query AND I pulled down the API stuff (via CFHTTP). It would have been nice to then run UPDATEs on the query based on the API data for that page -- also, the query was being cached so as to not have to hit the 3rd party API all the time.
Maybe a silly use-case. And, if we build the system over again, I'd probably find a way to make the syncing of certain data points for automated / scheduled. But for now, the infrastructure is just not there.
@Dave,
SMART!
My first reaction was trying this on HTML5's local SQL DB, but that wouldn't use CF :-/
@Ben,
Cool, and hey, like I said, sometimes you know point A and point C... but ideas like this are point B, thus an odd idea can sometimes bridge the gap and/or help you find a way around to the solution you were trying to achieve.
@Ben,
This is actually a useful test. I used to use QOQs quite a bit when I was coding in CF5 and used to manipulate using QuerySetCell and QueryAddRow etc. But had faced few weird errors after migrating to MX. So lately I rarely use QOQ. So I have moved on to arrays of structures instead. I particularly deal with ranges of numeric serial number which need to be broken at any point such as a start & end break would lead to two ranges and a mid break would lead to three ranges. I also at times have to cumulate the sequential serial numbers back into ranges. So being able to update/insert/delete into a query sounds tempting.
One other key thing I learnt from this post is that even a DDL like create table can be rolled back.. that's pretty cool. I had never tried that as I always thought rollback only applies to DML so i instead have tended to add a drop statement. Now this is a good aha moment for me...:) Thanks!
@Smita,
I *believe* that the CFTransaction tag applies to the CREATE command. But, I only say that because as I was building this, the DROP TABLE command seemed to say the table didn't exist when used outside the CFTransaction. So, I just assumed that's what was going on. I hope I am right :)
@Ben,
You are right! I tested before posting my comment..;)
@Smita,
Awesome! You rock :D
SMART!
My first reaction was trying this on HTML5's local SQL DB, but that wouldn't use CF :-/
it won't work indeed..