Killing Slow MySQL Queries In An Emergency In Lucee CFML 5.3.7.47
At InVision, we recently upgraded our MySQL database servers to use MySQL 5.7.x. And, while I'm excited for access to new features like the native JSON
column type and the sys
performance schema, the upgrade did have some bumps. In particular, we had a subset of SQL queries that started running with terrible performance (typically those that have very large IN(id-list)
clauses). I had to refactor a number of queries; but, while that code was being changed, I had to put together an emergency utility that would allow me to start killing SQL queries in production. I thought this was a fun little exercise that would be worth sharing in Lucee CFML 5.3.7.47.
ASIDE: Before I started killing queries manually, I did try adding a
timeout
to my<cfquery>
tags. Unfortunately, this did not seem to have any impact. I am not sure why thetimeout
didn't kill long-running queries in Lucee CFML - I know that this mostly worked when the code was running on Adobe ColdFusion.
This emergency utility has two main aspects: the ProcessListGateway.cfc
, which gives me access to the MySQL processlist
that returns all the database thread-states; and, a View that lists the currently-running queries and allows me to kill them individually.
Let's look at my ColdFusion component gateway first. It has two main methods: one that returns the processlist
and one that executes a KILL QUERY
SQL statement against a given thread ID. Each of these queries takes a datasource
against which to run the query since we have multiple replicas in production.
component
output = false
hint = "I provide low-level query methods for MySQL process-list interactions (for emergency use)."
{
/**
* I return the datasources that can be inspected (and subsequently affected).
*/
public array function getDatasources() {
return([ "primary", "replica", "replica2" ]);
}
/**
* I get the process list info for the given datasource.
*
* @datasource I am the ColdFusion datasource being inspected.
* @minTime I am the time over which a query must be running in order to be included.
*/
public query function getProcessList(
required string datasource,
required numeric minTime
) {
```
<cfquery name="local.results" datasource="#datasource#">
/* DEBUG: processListGateway.getProcessList(). */
SELECT
pl.id,
pl.time,
pl.state,
LEFT( pl.info, 300 ) AS info
FROM
INFORMATION_SCHEMA.PROCESSLIST pl
WHERE
pl.time > <cfqueryparam value="#minTime#" sqltype="integer" />
AND
pl.state != ''
ORDER BY
pl.time DESC
;
</cfquery>
```
return( results );
}
/**
* I kill the query with the given ID in the given datasource.
*
* CAUTION: Will throw an error if the given query ID / thread cannot be found.
*
* @datasource I am the ColdFusion datasource being inspected.
* @queryID I am the thread being killed.
*/
public void function killQuery(
required string datasource,
required numeric queryID
) {
```
<cfquery name="local.results" datasource="#datasource#">
/* DEBUG: processListGateway.killQuery(). */
KILL QUERY <cfqueryparam value="#queryID#" sqltype="integer" />;
</cfquery>
```
}
}
There's not too much going on here. I am using Tag Islands to run SQL inside <cfscript>
, which is really just the best thing since sliced-bread! But, other than that, you can't do much. Since this is for emergency use only and running in production (behind an administrative login of course), I wanted to make this was as restrictive as possible.
With this ColdFusion gateway in toe, I then created a simple UI (User Interface) which lists out the running queries in the processlist
and allows me to selectively run a KILL QUERY
command against a given thread-ID:
<cfscript>
// Param URL variables.
param name="url.datasource" type="string" default="";
param name="url.minTime" type="numeric" default="5";
param name="url.queryID" type="numeric" default="0";
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
processListGateway = new ProcessListGateway();
datasources = processListGateway.getDatasources();
errorMessage = "";
// Make sure that we are consuming one of the valid datasources. Since we are about
// to start killing queries, we want to make sure this is as locked-down as possible.
if ( ! url.datasource.len() ) {
url.datasource = datasources.first();
} else if ( ! datasources.contains( url.datasource ) ) {
throw( type = "InvalidDatasource" );
}
// If we have a query ID / thread in the process-list, let's try to kill it.
if ( url.queryID ) {
try {
processListGateway.killQuery( url.datasource, url.queryID );
} catch ( "database" error ) {
errorMessage = "Query with ID [#url.queryID#] could not be killed. Message: #error.message#";
}
}
// NOTE: Since the goal here is to kill LONG-RUNNING queries, we're going to supply
// a min-time that will only return queries that have run LONGER than the given
// threshold. This way, our process-list won't be littered with common queries that
// happen to be running at the time this ColdFusion script is run.
processList = processListGateway.getProcessList( url.datasource, url.minTime );
</cfscript>
<cfoutput>
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>
Process List (Emergency Use Only)
</title>
<link rel="stylesheet" type="text/css" href="./styles.css" />
</head>
<body>
<h1>
Process List (Emergency Use Only)
</h1>
<p>
<strong>Datasources:</strong>
<cfloop value="datasource" array="#datasources#">
<cfif ( datasource eq url.datasource )>
« <strong><a href="?datasource=#encodeForUrl( datasource )#">#encodeForHtml( datasource )#</a></strong> »
<cfelse>
<a href="?datasource=#encodeForUrl( datasource )#">#encodeForHtml( datasource )#</a>
</cfif>
</cfloop>
</p>
<h2>
Current Process List For "#encodeForHtml( url.datasource )#"
</h2>
<cfif processList.recordCount>
<table width="100%" border="1" cellpadding="5" cellspacing="0">
<thead>
<tr>
<th scope="col"> ID </th>
<th scope="col"> Time </th>
<th scope="col"> State </th>
<th scope="col"> Info (truncated) </th>
<th scope="col"> Actions </th>
</tr>
</thead>
<tbody>
<cfloop query="#processList#">
<tr>
<td>
#encodeForHtml( processList.id )#
</td>
<td>
#numberFormat( processList.time )#s
</td>
<td>
#encodeForHtml( processList.state )#
</td>
<td class="sql">
#encodeForHtml( processList.info )#
</td>
<td class="actions">
<a href="?datasource=#encodeForUrl( url.datasource )#&queryID=#encodeForUrl( processList.id )#">Kill query</a>
</td>
</tr>
</cfloop>
</tbody>
</table>
<cfelse>
<p>
<em>There are no running queries in the selected datasource.</em>
</p>
</cfif>
<!--- Show any error from attempting to kill a query. --->
<cfif errorMessage.len()>
<h2>
Error Message
</h2>
<p>
#encodeForHtml( errorMessage )#
</p>
</cfif>
</body>
</html>
</cfoutput>
As you can see, this is just some old-school requests-response life-cycle style programming. And by, "old-school", what I really mean is solid, battle-tested, and just clever enough to get the job done. And, if I kick off some SLEEP()
queries in MySQL and run this ColdFusion page, we get the following output:
As you can see, the slow SQL queries start showing up; and, I'm able to kill the ones that I know will start crushing the database CPU.
Obviously, killing SQL queries in production is a "Bad Thing"™ since a prematurely-terminated query will almost certainly lead to a user-facing issue (data not loading, for example). However, in an emergency situation, I'll take the occasional user-facing issue over 100% CPU utilization every day of the week. And now, I have this release valve in Lucee CFML 5.3.7.47 that I can exercise if - and only if - I need to.
Epilogue: Why Not Just Run These SQL Queries Directly in the Database?
You may be asking yourself, why go through all the ColdFusion rigmarole? Why not just run the processlist
and kill query
SQL statements directly against the database. Well, for starters, doing that once is easy - doing that a number of times is challenging and tedious. Being able to point-and-click in a ColdFusion page makes this much faster and repeatable.
But, more importantly, you don't just give people WRITE-access to a production database. If I wanted to run a kill query
statement in production without this code, I'd have to write the SQL query and then open a ticket for our database team to run it on my behalf. Given that our database team is busy, a ticket like this could take hours or event days to fulfill. And, given the fact that I needed to do this multiple times during the emergency window, this would not be a sustainable approach.
By embedding this in the ColdFusion application (behind a secure administrative login), I make it flexible and repeatable and secure. And, by locking the SQL down to accepting only a datasource
and a queryID
, it's significantly more secure when compared to an engineer with write-access to the database running the query on my behalf.
Want to use code from this post? Check out the license.
Reader Comments
@All,
In case this is interesting to anyone, here's the ColdFusion code I was using to spawn the long-running queries for testing:
I just love love love the ability to run array iteration in parallel in Lucee CFML. So many great use-cases for it.
@Ben
I could be wrong, but I believe this only defines how long the CF template should wait for a response, but the underlying MySQL process continues along completely unaware of any timing constraints.
@Chris,
At least on the Adobe ColdFusion, when looking at the "general log", it looks like a
KILL QUERY
is issued:www.bennadel.com/blog/3305-cfquery-timeout-uses-kill-query-command-with-mysql-5-driver-in-coldfusion.htm
.... I'll have to do the same kind of exploration for Lucee CFML.
@All,
I tried to write up a case study of the refactoring that this
KILL QUERY
approach gave me time to figure out:www.bennadel.com/blog/4001-case-study-removing-massive-in-id-list-clauses-for-performance-in-mysql-5-7-32-and-lucee-cfml-5-3-7-47.htm
I was able to bring the slowest execution times for a set of queries down from "catastrophic" to simply "poor user experience". Not a perfect solution; but, enough that it stopped the bleeding.
TOTALLY enjoyed this post, about Killing SLOW MySQL queries, but alas, we use MSSQL (with Lucee 5.3.8.206)... and processList is not part of the INFORMATION_SCHEMA. Suggestions?
@Marilou,
I'm glad you enjoyed the post. Unfortunately, I haven't used MS SQL Server in over a decade, so most of what I used to know about it has slowly disappeared. I just tried to Google for some related information and nothing super helpful came up. Sorry :(