Creating In-Memory Tables In SQL For Joining And Sumarizing
I use this method all the time and it works great so I thought I would share it in case any one else might fancy it. Often times, I will need to get a summary of data form a SQL data table based on dates (either by day, month, year, or what have you - but this could be applied to non-date situation quite the same). The problem is, that I don't want to make too many calls to the database and I don't want to make some really complicated GROUP BY statement.
To overcome this, I create a temporary in-memory table that has, as its records, the dates that I need to summarize. I can't just create this table with initial data, so I have to declare the data table first:
DECLARE @dates TABLE (
today DATETIME
);
In this case, the data table only has only column of type DATETIME which will be the date that I am sumarizing. Then, I use ColdFusion to insert dates into the temp table. In this case, I am inserting two weeks worth of days:
<!--- Loop over the dates to create the table values. --->
<cfloop index="intOffset" from="14" to="0" step="-1">
INSERT INTO @dates (
today
)(
SELECT
<cfqueryparam
value="#Fix( Now() - intOffset )#"
cfsqltype="CF_SQL_TIMESTAMP"
/>
);
</cfloop>
Notice that for each iteration, I am inserting one record into the @dates table. Also notice how I am creating the date; I am just subratracting the offset from Now() and Fix()'ing it to get the day-only value. REMEMBER the wicked sweet date-math features of ColdFusion.
Once I have this table, I select right out of it, and for each date, I create a summary of information from another table.
SELECT
d.today,
<!--- Get the number of logins for this day. --->
(
SELECT
COUNT( * )
FROM
useraccount_login l
WHERE
l.date_logged_in BETWEEN d.today AND (d.today + 1)
) AS login_count,
<!--- Get me the number of unique logins for this day. --->
(
SELECT
COUNT( DISTINCT l.useraccount_id )
FROM
useraccount_login l
WHERE
l.date_logged_in BETWEEN d.today AND (d.today + 1)
) AS unique_login_count
FROM
@dates d
;
In this case, we are not performaing a Join, but this methodology could easily be used for joining on temporary tables. What I really like about this is that I only go to the database once.
One other excellent feature of this is that I never have to worry about NULL values or missing values. If I was to try and do this with some sort of GROUP BY clause, I might not get all days. Imagine a situation where no user logged into the system on a given day. Even if I did a GROUP BY, I would not get that day returned in the result set. When you manually create the temporary table with all the required dates, you KNOW you are coming back with the right values.
Putting it all together we get:
<!--- Declare temporary table. --->
DECLARE @dates TABLE (
today DATETIME
);
<!--- Loop over the dates to create the table values. --->
<cfloop index="intOffset" from="14" to="0" step="-1">
INSERT INTO @dates (
today
)(
SELECT
<cfqueryparam
value="#Fix( Now() - intOffset )#"
cfsqltype="CF_SQL_TIMESTAMP"
/>
);
</cfloop>
<!--- Get data summaries for log-ins. --->
SELECT
d.today,
<!--- Get the number of logins for this day. --->
(
SELECT
COUNT( * )
FROM
useraccount_login l
WHERE
l.date_logged_in BETWEEN d.today AND (d.today + 1)
) AS login_count,
<!--- Get me the number of unique logins for this day. --->
(
SELECT
COUNT( DISTINCT l.useraccount_id )
FROM
useraccount_login l
WHERE
l.date_logged_in BETWEEN d.today AND (d.today + 1)
) AS unique_login_count
FROM
@dates d
;
Want to use code from this post? Check out the license.
Reader Comments
You might be better suited to have a permanent Pivot table in your database, that has 1 column (i) and however many rows you want (I actually have 3 tables with 100, 1000, and 10000 rows). Then, instead of having to worry about a table with days, your query becomes:
SELECT
DATEADD(DAY,0-p.i,dates.today) AS TheDate,
COUNT( * ) AS LoginCount,
COUNT(DISTINCT l.useraccountid) AS UniqueLoginCount
FROM
( SELECT DATEADD(DAY,DAY(GETDATE())-1,DATEADD(YEAR,YEAR(GETDATE())-2000,{d '2000-01-01'})) AS Today ) AS dates
INNER JOIN Pivot100 AS p ON (p.i BETWEEN 0 AND 13)
LEFT OUTER JOIN useraccountlogin AS l ON (l.dateloggedin = DATEADD(DAY,1-p.i,dates.today))
GROUP BY p.i
ORDER BY 1 DESC
I know it looks a bit screwy, but bear with me.
The dates table is now made without CF's help by creating a subquery that returns one row with today's date, minus the hours, minutes, and seconds.
As I mentioned, your Pivot100 table was 1 column (i) with 101 rows of values 0-100 (yeah, yeah, I know, but it's useful to have a 0 sometimes).
The INNER JOIN looks weird because the ON clause doesn't actually relate the two tables. But it doesn't have to! The ON clause limits the Pivot100 table to just 14 rows, and the INNER JOIN then becomes a cartesian product. (You could do the same thing with a CROSS JOIN and then put the restriction in the WHERE clause, but I like the locality of having it in the ON clause. And not all DMBSs support CROSS JOINs.) You thus get a table that looks like:
today / i
2006-09-11 / 0
2006-09-11 / 1
2006-09-11 / 2
...
2006-09-11 / 13
You can then join to the login table based on (Today - i) days matching up with the login date. Since you now have all of your dates, you don't have to worry about it eliminating any of them, because it's a LEFT OUTER JOIN.
Voila! Same result, but in pure SQL.
Every database should have pivot tables. They make life sooo* much easier.
Rick, you magnificent bastard. I never thought of having pivot tables just sitting there to use. That is a great idea! The SQL looks a bit more crazy, but I totally understand what you are doing. And, what's nice about have the in-line "view" for the pivot table is that you can create as many columns as you like. Very cool - a lot of times I do create several columns.
Pure dynamite!
This looks hopeful for solving my problem. Old stuff but maybe we can revisit it. I know about the pivot table trick, but the thing I see is that the date stays the same.
What I want is to be able to show (using this example) logins for a month for each person.
What I'm doing now is creating a temp table with 1 field (theDate) and filling it from startDate to endDate for the report. Then doing the joins to get the data out. I'm thinking there must be an easier way using a pivot table to maybe add the pivot data to the startDate and pulling the data that way.
But one other question that your code left me wondering about. Are you able to join a table in memory and a table on a server? Or did you pull all the data from the server and then do the join to the table in memory?
@Don,
You can definitely join an in-memory table to a table on the server - that is one of the biggest benefits to using the in-memory tables.
As far as the dates and pivot tables, what I will often do for that is use a pivot table to create a set of dates based on month offsets. For example, I might use the first 12 rows of a pivot table to get the last 12 months worth of data where each pivot ID is a month of dates:
DECLARE @ThisMonth DATETIME;
SET @ThisMonth = '2009/02/01';
SELECT
. . . . ( 1 - p.id ) AS month_offset,
. . . . (
. . . . . . . . DATEADD( M, (1 - p.id), @ThisMonth )
. . . . ) AS month_start
FROM
. . . . pivot1000 p
WHERE
. . . . p.id <= 12
This will get the last 12 months. This resultant table can then be joined to another table to get grouped data.
Silly me. I was thinking of something else. I use ColdFusion and was thinking of a query of query going joining to a db, which can't be done as far as I know.
Anyway, what I was looking for was coming up with a query result showing the next 2 weeks and both free and taken days. This is for a small hotel. They want potential guests to pick a date and see the number and types of rooms available for 2 weeks on either side of that date.
This put me on the right track. One thing for the dates, I now use the day of the year vice actual dates. That way I just need 2 pivot tables, 1 for normal years and 1 for leapyears. Numbered from 1 to 365/366. When I get my date numbers I just add that many days to Dec 31 of the previous year and get the date.
Is there reason or argument against using QueryNew() to build a temp table of data?
I was just setting up a page that will need to hold data from two databases, and wasn't sure which would work out better.
I don't know if there is an argument against it but I use that technique when pulling from many data sources such as RSS feeds.
Actually what I do is just create a temp table and the insert the data into it after massaging it to match.
No real need to do a QueryNew() unless you can't create temp tables on the database or if you don't want to clean up after yourself I guess. ie delete the temp table.
@Paul, @Don,
You can certainly use the Query of Queries to join multiple data form the same or different data sources. Query of queries is AMAZING... except when it is not :) It can cause problem.
It depends on what you are trying to do and on what SQL server you have. For example, if you just have a small set up data to create, sometimes going to the SQL server can be much faster (such as when jointing to a pivot table).
Of course, if you are using MySQL, you can't create in-memory data tables, so you have to take that into account when choosing your approach.
I'm wondering if a pivot table is applicable a solution to this item I'm working on.
Currently, I have separate tables holding the various options available for drop down options through the back-end of my web site. I'm combining them into one ListOptions table. One one particular drop down box, I have a chained select (change the value in one drop down and the 2nd drop down changes). I'm wondering how to setup the values in a single table. Currently, I'm setting up the columns in the table with:
id, listoption, listdescription, sortorder, and listtype
So to give an example, lets use cars (I know you prefer to use girls but I digress) :)
In the first list, you have ford, chevy, dodge, etc. Once the selection changes from Ford to Chevy, the second list will change to show the vehicles chevy builds. how would I populate this in the database with the columns listed above keeping the database efficient?
Hi Ben! Long time no see!
I'm in a new job at Albany Med and busy, busy
I was looking at this page and noticed you need to add an 'm' in the title for the word Summarizing