How I Stop Spammers On My ColdFusion Blog
There have been some posts recently about how people stop spammers from submitting comments on their blogs and contact forms, so I thought I would share mine as it has been working near flawlessly. I wanted to keep mine simple. I don't care for Captcha as I find it hard to read; the de-spamming process shouldn't keep out people as well as spam bots. I wanted to keep it HTML, keep it easy for humans, hard for computers. What I came up with was math.
Now, I know what you're thinking, "Math, harder for a computer?" No, not at all. Math will always be harder for humans. The difference here is that reading the math will be easy for humans, hard for robots. What I do, is provide a mandatory math equation for every submit form:
To cut down on spam, please solve this math equation: ( 2 + 10 )?
To make this easy for humans and harder for computers, the source code for this equation look like:
To cut down on spam, please solve this math equation: (
<span class="despamminq">-</span> <span class="despammint">6</span> <span class="despamminz">19</span> <span class="despammina">13</span> <span class="despamming34">9</span> <span class="despamminzzz">-</span> <span class="despamming01">2</span> <span class="despammingj">+</span> <span class="despammin4">17</span> <span class="despamming4">+</span> <span class="despammingg">6</span> <span class="despamminnn">10</span> <span class="despamminzzz">+</span> <span class="despamminz">3</span> <span class="despammint">11</span> <span class="despammin09">16</span>
)?<br />
And further more, if you were to copy and paste the equation from the web browser (at least in FireFox), it would look like:
To cut down on spam, please solve this math equation: ( - 6 19 13 9 - 2 + 17 + 6 10 + 3 11 16 )?
So, what is going on here? First of all, let me say that it is nothing fool proof. Robots can figure it out, but not yet it seems. The security here is has several aspects:
Encoding
All the character values in the above equation are ASCII encoded. That means, that instead of being represented by the physical character, such as "A", the characters are represented by the escaped ASCII value, such as "A". This, of course, is only in the source code of the page. When viewing the web page, the user is seeing the easy-to-read evaluated value, "A".
Again this is not fool-proof. Robots can decode ascii values. It just adds an obstacle that they have to figure out.
Randomly Dirty
The next obstacle is that the equation is randomly dirty. As you can see from the pasted value, the equation contains much more than the two values and a single operator. It is interspersed with random numbers and operators. Again, not fool proof, just an obstacle.
CSS Leveraged
The final obstacle is CSS. The way I get the equation to show properly is to hide many of the spans being displayed. In my example, the there are several CSS classes that can be displayed and several that are hidden. By making allowing several classes to hide and several classes to show, it should make finding a pattern even harder for the spam-bot.
Now, none of these is fool proof. Even all combined, a robot could be programmed to figure it out. The point here is that these three obstacles combined make it tough. So far, with my new despamming methodology in place, I have not gotten a single spam post or contact form submissions. I had a bot hit my comment page close to a 70 times in one day and nothing got through.
I have a custom function that helps me create this de-spamming text. It takes the two values, the operator, the list of classes that are visible, and the list of classes that are hidden:
<cffunction name="DeSpamEquation" access="public" returntype="string" output="false"
hint="Returns the math equation with some extra stuff in there.">
<!--- Define arguments. --->
<cfargument name="Value1" type="string" required="yes" />
<cfargument name="Value2" type="string" required="yes" />
<cfargument name="Operator" type="string" required="yes" />
<cfargument name="VisibleClasses" type="string" required="yes" />
<cfargument name="HiddenClasses" type="string" required="yes" />
<cfscript>
// Define the local scope.
var LOCAL = StructNew();
// Create results string.
LOCAL.Result = CreateObject( "java", "java.lang.StringBuffer" ).Init();
// Create a random number of buffer zones.
LOCAL.BufferSize1 = RandRange( 1, 5 );
LOCAL.BufferSize2 = RandRange( 1, 5 );
LOCAL.BufferSize3 = RandRange( 1, 5 );
// Get the class list size.
LOCAL.VisibleClassesLength = ListLen( ARGUMENTS.VisibleClasses );
LOCAL.HiddenClassesLength = ListLen( ARGUMENTS.HiddenClasses );
// Add a hidden operator.
LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.HiddenClasses, RandRange( 1, LOCAL.HiddenClassesLength ) )#"">" );
LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( "-" ) );
LOCAL.Result.Append( "</span> " );
// Create a random number of fake spans.
for ( LOCAL.Index = 1 ; LOCAL.Index LTE LOCAL.BufferSize1 ; LOCAL.Index = (LOCAL.Index + 1)){
LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.HiddenClasses, RandRange( 1, LOCAL.HiddenClassesLength ) )#"">" );
LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( RandRange( 3, 20 ) ) );
LOCAL.Result.Append( "</span> " );
}
// Add a hidden operator.
LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.HiddenClasses, RandRange( 1, LOCAL.HiddenClassesLength ) )#"">" );
LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( "-" ) );
LOCAL.Result.Append( "</span> " );
// Add the first value.
LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.VisibleClasses, RandRange( 1, LOCAL.VisibleClassesLength ) )#"">" );
LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( ARGUMENTS.Value1 ) );
LOCAL.Result.Append( "</span> " );
// Add a hidden operator.
LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.HiddenClasses, RandRange( 1, LOCAL.HiddenClassesLength ) )#"">" );
LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( "+" ) );
LOCAL.Result.Append( "</span> " );
// Create a random number of fake spans.
for ( LOCAL.Index = 1 ; LOCAL.Index LTE LOCAL.BufferSize2 ; LOCAL.Index = (LOCAL.Index + 1)){
LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.HiddenClasses, RandRange( 1, LOCAL.HiddenClassesLength ) )#"">" );
LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( RandRange( 5, 20 ) ) );
LOCAL.Result.Append( "</span> " );
}
// Add the operator.
LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.VisibleClasses, RandRange( 1, LOCAL.VisibleClassesLength ) )#"">" );
LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( ARGUMENTS.Operator ) );
LOCAL.Result.Append( "</span> " );
// Create a random number of fake spans.
for ( LOCAL.Index = 1 ; LOCAL.Index LTE LOCAL.BufferSize2 ; LOCAL.Index = (LOCAL.Index + 1)){
LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.HiddenClasses, RandRange( 1, LOCAL.HiddenClassesLength ) )#"">" );
LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( RandRange( 6, 20 ) ) );
LOCAL.Result.Append( "</span> " );
}
// Add the second value.
LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.VisibleClasses, RandRange( 1, LOCAL.VisibleClassesLength ) )#"">" );
LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( ARGUMENTS.Value2 ) );
LOCAL.Result.Append( "</span> " );
// Add a hidden operator.
LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.HiddenClasses, RandRange( 1, LOCAL.HiddenClassesLength ) )#"">" );
LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( "+" ) );
LOCAL.Result.Append( "</span> " );
// Create a random number of fake spans.
for ( LOCAL.Index = 1 ; LOCAL.Index LTE LOCAL.BufferSize3 ; LOCAL.Index = (LOCAL.Index + 1)){
LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.HiddenClasses, RandRange( 1, LOCAL.HiddenClassesLength ) )#"">" );
LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( RandRange( 3, 20 ) ) );
LOCAL.Result.Append( "</span> " );
}
// Return the span test.
return( LOCAL.Result.ToString() );
</cfscript>
</cffunction>
Outputting the math equation is not the only part. We need a way to test it. Since my math equation is randomized, it never shows up the same for page loads (or at least, with a very low probability). To make sure I only get proper answers, I send the values and operator through the form as well. These values are encrypted so that only I can read them on the server:
<input
type="hidden"
name="de_spam"
value="#UrlEncodedFormat( Encrypt( "#REQUEST.DeSpam.Value1#,#REQUEST.DeSpam.Value2#,#REQUEST.DeSpam.Operator#", "spam-key-here" ) )#" />
Then on the server, I can decrypt the values and check them against the user's submitted solution.
This is not a fool-proof solution, as I keep saying. But it does have randomness that makes it harder to crack for a robot. Humans should find this easy to use, as long as they can do a little math. Sweet, simple, and effective.
Want to use code from this post? Check out the license.
Reader Comments
Hey, this is pretty nifty. I just tested it and took a look at your source code. Cool stuff. I have to admit, i messed up the math question the first time :) Hey, it's been a while since I have done math :)
If you really want to go that extra mile, you may want to add in some CSS inheritance:
.spam1 .spam2 { display: hidden }
.spam3 .spam4 { color: white; background-color: white; }
In that way, .spam2 is only hidden if it's inside of .spam1. Or, .spam4 is only white if it's inside of .spam3. Bots would have to implement a full-blown CSS parser to work around it.
You couldn't get too fancy (child/sibling selectors), due to shoddy CSS support in still-used browsers (*cough*Explorer*cough*), but it'll ratchet up the bar that much higher.
Or, for the truly insane, go with a Schneier-esque wheat-and-chaff presentation. Present 10 different equations, all similarly obfuscated, then a hint such as "what is the answer to the second blue equation?". (I've often wondered why this isn't done already with CAPTCHA.)
Rick, as always, you offer excellent insight and suggestions. I think the inheritence idea is fantastic. Let me work on implementing it. It would create a lot more classes as I would want to have multipel parent classes (otherwise it would defeat the randomness). But, still, definately doable.
And don't forget:
http://www.thinkgeek.com/homeoffice/stickers/3185/
Very impressive. Math rocks!
Math does rock :) If only we all spoke math, we would never misunderstand each other ;)
I'm doing something similar on my blog's contact form, but I'm doing it with JavaScript. But the problem I'm running into is that spam bots often skip the form and post directly to the form processor. I'm thinking of the best way to make sure the form data actually comes from my form, otherwise forward to /dev/null
Jacob, to overcome this, I am posting the answer to the form as a hidden, encrypted field. That way, if the bot was to post directly to the form, not only would it have to post an answer, it would also have to know how to properly encrypt the data so that I could decrypt it on the server and check it against the provided answer.
I came up with a solution last night. I thought about doing what you did, Ben (encrypting the answer), but I ended up using ajax to set a session variable, and then in the form processor I check for that session variable, and if it's the right value. Time will tell if this will work well or not.
But I agree with your original post, captcha seems too cumbersome compared to this. Not to mention, I'm on a Linux server and I haven't been able to get the open source captcha components to work. I've heard that Alagad's works on headless Linux, but I'm too cheap to buy something like that.
To make it even easier for my users, I restricted the range of numbers to keep them small enough for simple math. And, I made the first number a smaller range than the second number (less than 10) to make it even easier.
Jacob, I like the idea of the AJAX stuff. I am not sure how well the spam bots handle Javascript, so it might be one more level of protection to exclude non-js capable browsers.
As far as the numbers, I like the restriction. I currently restrict to 20, which is not too bad, but still messes some people up.
Jacob, why do you need to use ajax to set a session variable?
Andrew
OK, I get it!! on the assumption that the bot can't parse the javascript then the bot won't set the session variable whilst a user will!
This may be completely wrong but... why don't you use session comparisons?
For example; in a form have a hidden field with the current sessionID in it. When the form is submitted compare this value to the current sessionID, if matches execute as normal, if not throw away.
Now the bot will use its cached version of the form session data which when compared to the 'real' current session will be incorrect unless the bot spams you instantly. Setting a session timeout relatively low will help.
I understand this is not the most comprehensive solution but seems to have worked so far on my blog.
Or am I missing something really obvious?
Ads,
That is definately a valid solution. The problem with it, while small, is that there are a good number of people (I don't want to say paranoid people, but...) that turn off their cookies. Without the cookies, the session is not easily help from page to page and I don't want to have complicated session handling. I want those people to submit forms also.
while tring to user your function this error is come:
Element LIBRARY.TEXT is undefined in a Java object of type class [Ljava.lang.String; referenced as
The error occurred in C:\website\amncaptcha\DeSpamEquation.cfm: line 37
35 : // Add a hidden operator.
36 : LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.HiddenClasses, RandRange( 1, LOCAL.HiddenClassesLength ) )#"">" );
37 : LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( "-" ) );
38 : LOCAL.Result.Append( "</span> " );
39 :
Ameen,
ToAsciiText() is another function I have in my library. Sorry it was not posted above. Here it is (you will have to tweak your code to call it as you probably do not have your UDF's broken up the way I do).
<cffunction
name="ToAsciiString"
access="public"
returntype="string"
output="no"
hint="Returns the given string in ascii format. This can be used for making strings hard for web-spiders to read.">
<!--- Define arguments. --->
<cfargument name="Text" type="string" required="yes" />
<cfscript>
// Define the local scope.
var LOCAL = StructNew();
// Create a default safe string.
LOCAL.AsciiText = "";
// Loop over the characters in the string and convert to ascii.
for (LOCAL.CharIndex = 1 ; LOCAL.CharIndex LTE Len( ARGUMENTS.Text ) ; LOCAL.CharIndex = (LOCAL.CharIndex + 1)){
LOCAL.AsciiText = (LOCAL.AsciiText & "&##" & Asc( Mid( ARGUMENTS.Text, LOCAL.CharIndex, 1 ) ) & ";" );
}
// Return the new ascii string.
return( LOCAL.AsciiText );
</cfscript>
</cffunction>
thanks Ben it works perfect
Ameen,
Awesome. Glad to help :)
The problem I have with this approach is that it also bars people using screen readers, e.g. blind people using JAWS, or other mechanisms like that. When you block the bots, you also block the blind and people with other disabilities.
This can create financial liabilities, as the folks developing the Sydney2000 Olympics site learned to the tune of a $40,000 judgement plus legal costs when a blind user sued them under the anti-discrimination laws. With 6 weeks to go to the Olympics they had to re=write much of the site.
So even if you dont think blind users amount to a significant proportion of your user base, you stlil have to watch out for anti-discrimination laws.
Anyway, I'd like to see if there can be a variation of this idea (which is a REALLY good idea by the way!) which would allow humans using screen readers to get around the blocking. Perhaps naming the fields something like "ThisfieldjustToTrickSpamBots_DoNotChange" or some such.
Mike, you raise excellent points. I am already trying to tackle these:
www.bennadel.com/index.cfm?dax=blog:405.view
Not quite there yet, but almost.
<a href= http://forum.lixium.fr/cgi-bin/liste.eur?wellbut > wellbutrin sr </a> [url= http://forum.lixium.fr/cgi-bin/liste.eur?wellbut ] wellbutrin medication [/url]
Contact the advertisers not the sender. Provide them with the emial and the full header.
Quite often they do not really understand they are supporting the spammers.
I think it would be funny to make the math a little harder, and put a calculator at the bottom for the user if they cant figure it out.. hehe
I messed up the math the first time too.. calculus > addition ... :)
When it comes to anti-discrimination having this type of security wouldn't qualify as descrimination if the person can not operate the captche or the anti spam part of the website you could always have in your TOS that states in the event that you are not able to access any portion of the website that you are allowed to access you may contact us at email@whatever.com and someone would be glad to help you.
You could also state that this website is offered as-is, and that no help or guidance will be offered based solely on the administration and/or website owners discretion, or something to that effect.
That would eliminate any legal obligations that you would have.
Also something you might want to note is that website are extremely hard to sue when it comes to the way that they present there content, or not present the content. ( You can't please everyone )
The math is simple and sure I shall try this.