ColdFusion CFQuery MaxRows Not A Bad Compromise When TOP Not Available
Not so long ago, I was introduced to the MaxRows attribute of the ColdFusion CFQuery tag (thanks Joe Rinehart). It's one of those things that you can't believe you didn't know about, but sure enough, there it is. If anyone else doesn't know what it does, it limits the number of records that can be returned by a given query. So, for instance, the following query:
<!--- Limit the record set using MaxRows. --->
<cfquery name="qMovie" datasource="..." maxrows="30">
SELECT
*
FROM
movie
ORDER BY
name ASC
</cfquery>
... will only return the TOP 30 rows to the ColdFusion memory space since it's MaxRows attribute is set to "30." This is similar to the way the TOP directive works in SQL server:
<!--- Limit the record set using TOP. --->
<cfquery name="qMovie" datasource="...">
SELECT TOP 30
*
FROM
movie
ORDER BY
name ASC
</cfquery>
Both the above queries accomplish the same thing by limiting the max number of returned records to 30. However, one is a ColdFusion directive (MaxRows) and one is a SQL directive (TOP). When compared though, MaxRows does pretty well. When I "select *" from a database that has 50,000+ records, using the TOP directive executed in about 30-40 ms. MaxRows returned the same record set in around 125-140 ms. This is about 3-4 times slower... but when you compare that to the query that ran without either the TOP or the MaxRows directive (on average 400-450 ms), it's a pretty darn good compromise when the SQL TOP directive is not available.
I don't know why MaxRows executes faster than the query that has neither TOP nor MaxRows. I assume that ColdFusion is basically cutting off the database stream once the "MaxRows" number of records have been returned (but this is just theory). I wonder how this affects the SQL Server. I wonder if, from the SQL standpoint, it is still returning all those rows? Does the SQL server still have to amass the massive record set?
Efficiency aside, MaxRows is sometimes the only option available, such as when performing a ColdFusion query of queries (which is where Joe jumped in and pointed out my ignorances).
Want to use code from this post? Check out the license.
Reader Comments
I know that OLEDB, and I think also ODBC, can tell the SQL Server that it only wants the first n rows, and I know that the SQL Server will optimize for that. (I think I remember reading something about SQL Server even being smart enough to throw away unneeded data mid-query once it has fulfilled it's minimum result set.)
And, of course, always remember that Postgres and mySQL use the LIMIT n syntax, DB2 uses the FETCH FIRST n ROWS ONLY syntax, and Oracle uses ROWNUM in the WHERE clause.
PITA, eh?
What? They couldn't come up with a way to make "FETCH FIRST n ROWS ONLY" even longer? Maybe throw a "please" at the end :)
Rick, thanks for the insight. Good to know that optimization does occur and that it occurs in both the ColdFusion server and the SQL server (potentially).
Here is how I would do it in MySQL
The First 10
Select * from table order by ID Limit 10
The Last 10
Select * from table order by ID Desc Limit 10
Actually, it's even worse than that. For most queries via CF, it's a good idea to tell the DB2 engine that the query you are about to run is not going to be updated. It will avoid some locking problems and speed up your query by around 5%. In SQL server, it's pretty straightforward:
SELECT TOP 10 foo, bar
FROM mytable (NOLOCK)
How do you do it in DB2?
SELECT foo, bar
FROM mylib.mytable
FETCH FIRST 10 ROWS ONLY
FOR FETCH ONLY
Someone at IBM really, really liked the word "ONLY".
@Rick,
That is really interesting. I have never seen that before. Is that something you should do only if you are not expecting simultaneous hits to the DB? Or is that something you can do any time you are just reading?
For NOLOCK, it allows your query to perform dirty reads, which may or may not be applicable to your situation. For data that changes very infrequently but is queried often (especially as part of a large join), dirty reads may improve performance by reducing table contention.
But!
Dirty reads can also read data that never actually gets committed! If a transaction fails and is rolled back, a dirty read might get the bogus data. I only use it for lookup tables that almost never change.
Done completely in DB2:
SELECT foo, bar
FROM mylib.mytable
FETCH FIRST 10 ROWS ONLY
OPTIMIZE FOR 10 ROWS
FOR FETCH ONLY
WITH UR
That "optimize" part kindof acts like the server-size equivalent of CFQuery's BLOCKFACTOR attribute, while the "UR" part represents "uncommitted read", the dirty part.
DB2 is nothing if not verbose. It may seem like over-optimization, but it's good to know how to do just in case.
Here's a tidbit to make maxrows more useful. I had a cfc method that sometimes I wanted to return all records, sometimes top two or three. No problem, pass in a 2 or 3 as an argument and drop it in maxrows. So far so good. But what if I want to see all records? It turns out that passing in -1 works.
@Jim,
Oh cool, didn't know that.
I just found this post:
http://www.coldfusionjedi.com/index.cfm/2009/8/12/MAXROWS-Attribute--Not-as-good-as-I-thought
It is worth reading if you are trying to use MAXROWS to improve performance. It probably is not doing what you expect.
Beware using the maxrows with the SQL Statement WHERE IN () Because it will give the query rows result as many as you define it in the maxrows. Instead of million records you expected, it will return only X records. This problem might come from the Coldfusion SQL Server driver.
@Jusuf,
I am not sure what you are saying? The use of MaxRows is designed to limit the number of records reported in the query? What are you saying about how it interacts with IN()?
Let's say you query 1M rows with maxrows=10. When you dump the recordresult it will return 1M however if you have the SQL Statement using subquery IN for some reasons it returns 10 instead of 1M.
One of the most interesting with Maxrows is you can loop from 1-1M using startrow (1,11,21,..) and endrow (10,20,30,...) and get 10 records per page.
@Ben Nadel, So sorry I think I got mess up here. Just forgot/delete my previous comments. What I find out is when I used the SELECT TOP n only getting the record result from 1-n however the maxrows it's selecting n records from the result. This will make different for example you want to sort the order by date descending. The TOP n only sort on the top n records.
Again, sorry on my previous records. Thanks.
@Jusuf,
No worries at all - that's what we're here for - good conversation, no matter where it leads.
Suppose I have a cfquery that has maxrows attribute set to 50. What do i need to do in order to get the 51st row using the same cfquery without changing the maxrows attribute value?
Thanks.
@Ben,
Thanks for doing the legwork on this... It's good enough for me to presume that MySQL's LIMIT is somewhere on par with TOP over maxrows.
@Sharath,
With MySQL you can forget about maxrows and do it like this:
SELECT column FROM table LIMIT 1 OFFSET 50
Other than that, you could set maxrows to 51 and extract the data you require like so: