Making SQL Queries More Flexible With LIKE In MySQL 5.7.32 And Lucee CFML 5.3.7.47
While you might stand-up something like Elasticsearch, Lucene, or Solr in order to provide robust and flexible text-based searches in your ColdFusion application, your relational database is more than capable of performing (surprisingly fast) pattern matching on TEXT
and VARCHAR
fields using the LIKE
operator. This is especially true if the SQL query in question is already being limited based on an indexed value. At InVision, I often use the LIKE
operator to allow for light-weight text-based searches. And, as of late, I've been massaging the inputs in order to make the matches even more flexible, allowing for some slightly fuzzy matching in Lucee CFML 5.3.7.47.
With the SQL LIKE
operator, the underscore (_
) matches any single character. And, the percent (%
) matches any arbitrary number of characters, including zero characters. To make a LIKE
operation more fuzzy / flexible, all I do is replace some of the input characters with the %
wildcard character. This gives the user just a little more wiggle-room when trying to match against values in the MySQL database.
And, of course, I am using Regular Expressions (RegEx) to massage the inputs. Basically, I take any non-Word character and replace it with a %
. However, the "Word" character class contains the underscore, which is a special character in LIKE
operations. As such, I need to replace _
with a %
as well.
ASIDE: If you're not using RegEx pattern matching on-the-daily in your programming life, you are missing out on a tremendous amount of power. I urge you to watch my video presentation: Regular Expressions, Extraordinary Power. Literally not a day goes by without using RegEx to solve some sort of problem whether in the code itself or in a codebase search in SublimeText.
To see this in action, I've put together a small ColdFusion demo in which I have a user-defined function (UDF), makeKeywordsLikable()
that takes a String an returns a "pattern" that can be used in a LIKE
operation. Then, I try to run this UDF against some static strings:
<cfscript>
// In order to make the SEARCH a little more flexible, we're going to transform the
// target patterns into something a little more "likable" by replacing certain
// characters with wildcard characters.
// --
// CAUTION: Normally, I would ALWAYS USE the CFQueryParam tag when binding any SQL
// parameter within a CFQuery tag; however, for the sake of the demo and of brevity,
// I am omitting the param tag. Do not do this in production.
```
<cfquery name="matches" result="result" datasource="testing">
SELECT
'Testing Comments' LIKE '#makeKeywordsLikable( "Test Comment" )#',
'[A56] Redesign' LIKE '#makeKeywordsLikable( "56 - Design" )#',
'Rebok - Web Properties' LIKE '#makeKeywordsLikable( "Rebok Web" )#',
'[JimmyJane][Internal] Admin' LIKE '#makeKeywordsLikable( "Jimmy Jane Admin" )#',
'%_' LIKE '#makeKeywordsLikable( "%_" )#'
;
</cfquery>
```
dump( matches );
echo( "<pre>#result.sql.reReplace( '(?m)^\t{2}', '', 'all' )#</pre>" );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I take the given keywords and return a value that can be used in a LIKE operation.
* The returned pattern is intended to be more flexible than a simple "%value%" match.
*/
public string function makeKeywordsLikable( required string keywords ) {
var flexibleKeywords = keywords
// Replacing any string of non-WORD characters with the wildcard character.
// When using LIKE, the underscore ("_") is a special character that matches
// any one / single character. Since it (underscore) is also a "word"
// character in RegEx parlance, we're going to replace it as well.
.reReplace( "[\W_]+", "%", "all" )
// Replacing any string of wildcard characters with a single wildcard.
.reReplace( "%{2,}", "%", "all" )
// Strip-off any leading / trailing wildcard characters since we'll be
// appending those again down below.
.reReplace( "^%|%$", "", "all" )
;
// EDGE CASE: If we've ended-up with ZERO non-wildcard characters at this point,
// let's just return the original value. We don't have enough to match against;
// and, MySQL will happily match "%%" against every other String value.
if ( ! flexibleKeywords.len() ) {
// Let's escape any special characters so they have no meaning in LIKE.
return( keywords.reReplace( "([%_])", "\\1", "all" ) );
}
return( "%" & flexibleKeywords & "%" );
}
</cfscript>
As you can see in our SELECT
expression, none of the static strings exactly match their associated input values. However, when we execute this ColdFusion and MySQL code, every LIKE
operation results in 1
(a positive match). That's because we've made our input pattern more flexible. Here is the SQL that is actually being executed against the database (retrieved from result.sql
on our CFQuery
tag):
SELECT
'Testing Comments' LIKE '%Test%Comment%',
'[A56] Redesign' LIKE '%56%Design%',
'Rebok - Web Properties' LIKE '%Rebok%Web%',
'[JimmyJane][Internal] Admin' LIKE '%Jimmy%Jane%Admin%',
'%_' LIKE '\%\_'
;
As you can see, the makeKeywordsLikable()
user-defined function has simplified the input and merged-in a number of %
wildcard characters. Except in the edge-case where no non-wildcard characters could be found. In that case, we use the original input but escape any wildcards.
At first, when I started using this approach, I was afraid that the database performance would tank. I was always told that LIKE
was a slow operation. And, it might be when compared to an index-based look-up. However, when this technique is used in conjunction with an index-based search in my ColdFusion application, I've found it to be surprisingly fast and pleasantly flexible. Overall, I think it provides a nice user experience (UX).
Want to use code from this post? Check out the license.
Reader Comments
Just curious. Do you have a broader set of unit tests for this?
How well does this work with different SQL databases? Does it work when dbtype="query"?
I've been using a modified version of the SmartSearch UDF from CFlib. (It doesn't use regex.)
https://cflib.org/udf/smartSearch
(We added SQLi detection & support for QoQ by using LOWER().)
@James,
What are these "unit tests" you speak of 🤪 😂 As far as database support, I believe the
%
and_
characters are part of core SQL - I don't think - but I'm not 100% sure - that any of this is MySQL-specific. And, to be honest, I haven't done a query-of-query in a while; though, I'm pretty sure there is case-sensitivity issues in QoQ (as you pointed out).I haven't seen that SmartSearch() function before - it looks intense!! I wish it had some examples of inputs/outputs in the JavaDoc. Scanning the code, it looks like it produces and
AND
andOR
style SQL condition, with some Fuzzy vs. Exact matching. Sounds like you've had a lot of success with it.