Preventing Spam Bot Form Submissions With ColdFusion (Revisited)

By Ben Nadel

Published 2007-08-27 in ColdFusion — Comments (26)

The other night, I was staring at my neighbor's lower-back tattoo when a new (to me) ColdFusion anti-spam technique popped into my head. I am sure that this is not new or unique, I just haven't explored this before. Until now, all of my ColdFusion-based Anti-Spam methodologies have resorted to CSS and TimeStamp-based tomfoolery. Well, I don't know if it was the graphical nature of my neighbor's sacral inkage, but suddenly I had the idea of using Images.

Images on a web page do not get loaded with the initial page request. Instead, as the HTML is rendering, the client makes subsequent requests to the server to load linked items like Images, Javascript files, and Style Sheets. Can we use this multi request paradigm to our advantage? I think so, at least with anyone who has a graphical browser.

The idea here is that Spam Bots probably don't ever render the form pages they spam; most likely just grab the HTML and then use that to programmatically submit the form. Because they never render the HTML page itself, they never make subsequent requests to the server to load images, stylesheets, and the like.

That's where this new plan comes into play. On the form page, we have an image tag that pings a ColdFusion page which causes some ID-based flag to be set. This ID is then also submitted with the form. When the request gets processed, you then check to see if the both the flag and the form-submitted ID exist (and match). If they do, then it proves the HTML page was rendered and that it was most likely not a bot.

To demonstrate, let's first look at the ColdFusion template that causes the server side flag to be set:

<!--- Kill extra output. --->
<cfsilent>

	<!--- Param the URL id. --->
	<cfparam
		name="URL.id"
		type="string"
		default=""
		/>


	<!--- Try to decrypt it and create a text file. --->
	<cftry>

		<!--- Decrypt the value. --->
		<cfset URL.id = Decrypt(
			URL.id,
			"that-is-tasty!",
			"CFMX_COMPAT",
			"HEX"
			) />

		<!---
			Create the text file that will mark the form
			submission as valid. Just store it as an empty
			text file since all we are going to be doing
			is checking for its existence.
		--->
		<cffile
			action="write"
			file="#ExpandPath( './spam/#URL.id#.txt' )#"
			output=""
			/>


		<!--- Catch any errors. --->
		<cfcatch>

			<!--- Something went wrong. --->

		</cfcatch>
	</cftry>


	<!--- Return an empty image. --->
	<cfheader
		name="content-length"
		value="0"
		/>

	<cfcontent
		type="image/gif"
		reset="true"
		/>

</cfsilent>

As you can see, practically nothing going on here. When the request comes in, we are decrypting the form ID in the URL scope. We then create an empty text file based on this form ID. This could just as easily have been an APPLICATION-scoped variable or something, but I figured this would be easier on the server's memory.

Now that we understand how the server-side, ID-based flag is being set, let's take a look at the Form page:

<!--- Kill extra output. --->
<cfsilent>

	<!--- Param form comments. --->
	<cfparam
		name="FORM.comments"
		type="string"
		default=""
		/>

	<!---
		Param the form ID. This is the value that we
		will use to check proper form submission (to
		protect against SPAM form submissions).
	--->
	<cfparam
		name="FORM.form_id"
		type="string"
		default=""
		/>

	<!--- Param the form submission. --->
	<cftry>
		<cfparam
			name="FORM.submitted"
			type="numeric"
			default="0"
			/>

		<cfcatch>
			<cfset FORM.submitted = 0 />
		</cfcatch>
	</cftry>


	<!--- Check to see if the form has been submitted. --->
	<cfif FORM.submitted>

		<!---
			Check to see if the FORM is valid by checking to
			see if the ks_stats.cfm file spawned a file with
			the given ID.
		--->
		<cfif FileExists(
			ExpandPath( "./spam/#FORM.form_id#.txt" )
			)>

			<!---
				The file exists. This confirms that the FORM
				page was actually loaded and spawned a second
				IMG request that then spawned this text file.
				This is probably NOT a spam bot.
			--->
			<cflocation
				url="confirm.cfm"
				addtoken="false"
				/>

		</cfif>

	</cfif>


	<!---
		If we have made it this far, then we are going
		to be showing the FORM again. Select a new form
		ID for this display.
	--->
	<cfset FORM.form_id = CreateUUID() />

	<!---
		Now that we have our form ID, let's encrypt it
		so that we don't have duplicate values in the body
		(that might be detectible pattern by a BOT).
	--->
	<cfset FORM.encrypted_form_id = Encrypt(
		FORM.form_id,
		"that-is-tasty!",
		"CFMX_COMPAT",
		"HEX"
		) />

</cfsilent>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
	<title>ColdFusion Anti Form Spam Idea</title>
</head>
<body>

	<cfoutput>

		<form action="#CGI.script_name#" method="post">

			<!--- This will flag form submission. --->
			<input
				type="hidden"
				name="submitted"
				value="1"
				/>

			<!--- This is the form ID. --->
			<input
				type="hidden"
				name="form_id"
				value="#FORM.form_id#"
				/>


			<label for="comments">
				Comments:
			</label>

			<textarea
				id="comments"
				name="comments"
				cols="50"
				rows="10"
				>#FORM.comments#</textarea>


			<input type="submit" value="Submit Comments" />

		</form>


		<!---
			This is the image that we will use to make sure
			the HTML of the current form page actually renders.
			I am calling it "ks_stats" just to make it less
			obvious to prying eyes.
		--->
		<img
			src="ks_stats.cfm?id=#FORM.encrypted_form_id#"
			height="1"
			width="1"
			style="display: none ;"
			/>

	</cfoutput>

</body>
</html>

If you look at the bottom of the page, you will see that I have an invisible IMG tag that pings our ks_stats.cfm file (the first file shown above) using an encrypted version of the form ID. I have called it ks_stats.cfm just to disguise it. I have also encrypted the ID so that it would be a harder pattern to pickup. This Ping triggers the previously discussed server-side flag to be set. Once the form gets submitted, we then just check to see if the text file (our server-side flag) exists. If it does, then we are deciding that the submitter is NOT a bot. If it doesn't exist, then we are saying that user IS a bot.

Of course, this is not fool-proof. If a user has their images turned off or they have a text-based browser, then they might be legitimate and yet still classed as a BOT. But then again, all ColdFusion-anti spam techniques that are not 100% content-based are going to have similar trade offs. I am not saying that this is the best way to perform antispam functions. Heck, I am not sure this is even a GOOD way. All I am saying is that it occurred to me. And, chances are, if people are going to have restricted browsers, they are probably going to have Javascript turned off before they start blocking images (that is just my theory).

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/929

Reader Comments

Dustin Aug 27, 2007 at 4:39 PM

42 Comments

Not a bad tactic there. Although it might become quite a bit harder to figure out if you use it for a real graphic in the site layout. I just say that because if I were to reverse engineer your form submission process and I saw that image tag in the source, that would be the first place I'd explore when my process failed.

That aside, why not use a session variable on the on the receiving page? I'd imagine you could get some odd concurrency issues if you always wrote to the same file and if you were to use different files for each post, you'd have to write a GC process as well.

Ben Nadel Aug 27, 2007 at 4:45 PM

15,996 Comments

@Dustin,

Yeah, certainly you could do a SESSION variable as well. I try to keep this stuff as low-level as possible so that it is more flexible for anyone who would want to use it. Plus, I figure if someone DOESN'T:

- Have / accept cookies
- Have Javascript enabled

... then this tactic would still work. But also, not sure if this is even a good thing to do.

Todd Rafferty Aug 27, 2007 at 5:55 PM

218 Comments

Seems like a lot of overhead. I'd want to see the performance on this.

Ben Nadel Aug 27, 2007 at 6:01 PM

15,996 Comments

@Todd,

Are you talking about performance in the sense of Effectiveness? Or Load on the server?

Todd Rafferty Aug 27, 2007 at 6:37 PM

218 Comments

Both. You're doing a fileWrite()/fileExists(). I don't think you're meaning for that to be part of the end design of the program, but still I think there's a performance impact there.

Tom K Aug 28, 2007 at 4:57 AM

8 Comments

You're forgetting accessibility - what if I have images turned off? what if I'm using a Braille reader (which may not bother with loading images)? :)

Sebastiaan Aug 28, 2007 at 6:23 AM

61 Comments

How about just performing a serverside check using CFFormProtect, free and downloadable from Riaforge?

Since implementing it I haven't had a single Spambot getting through ;-)

Ciqala Aug 28, 2007 at 6:59 AM

5 Comments

There is a project that has been in the making for a couple of years now that uses pictures of animals to prevent auto form-submission that is very interesting.

the project page is http://www.thepcspy.com/kittenauth

with an example of it on the same sites contact page http://www.thepcspy.com/contact

Ben Nadel Aug 28, 2007 at 7:21 AM

15,996 Comments

@Todd,

Yeah, the file writing / checking would have some overhead, but this could just have easily been done with an APPLICATION scope and StructKeyExists() which would be just this side of instantaneous. The idea was more about the multi-request nature of a graphical site - the actually implementation could be changed.

@Tom,

I am not forgetting about the blind... I am just calling them spammers :) I talked about the pros/cons above. I know this is not a perfect solution.

@Sebastiaan,

Yes, I have heard nothing but GOOD things about CFFormProtect. But I look at the code for it and its sooo long. I know that it's probably the best way to go, but then I wouldn't have as much fun hacking my own stuff together :)

@Ciqala,

I like the pictures, but remember, the main idea here is to get the end user to think less, not more. Having to click pictures requires that 1) they understand what the different animals are called and 2) have to stop and think about it. I want less thinking.

Sebastiaan Aug 28, 2007 at 7:24 AM

61 Comments

The nice thing about CFFormProtect is that is registers if people have used a mouse to click in the formfield or used a keyboard to type something. Furthermore it introduces a hidden form field that purposefully has to be left blank to pass a server-side test. As most bots just grab the form-fields and programmatically fill in ALL fields, also the hidden *empty* fields, they fail the test.

I myself for a long time wanted to implement something like a CAPTCHA to prevent spam-bots flooding me. But I felt it wasn't user-friendly (usability) nor accessible. A serverside check is the way to go, and when CFFormProtect was suggested to me by a friend - including a call to CFAkismet if you have an API-key, just flag the value in the config file - I implemented it instantly. 30 minutes later the first mails poured in reporting to me what the SPAM-bot had tried to post (with a full CF-dump of the submitted form info and then some - excellent as a back-up if someone's comment is mistakenly marked as spam).

So instead of trying to fix the front-end, implement the fix in the back-end ;-)

Matt Osbun Aug 28, 2007 at 8:22 AM

20 Comments

I'm almost afraid to suggest this...

But have you seen HotCaptcha?

www.hotcaptcha.com

Todd Rafferty Aug 28, 2007 at 8:27 AM

218 Comments

@Matt, That's so wrong on so many levels. LOL.

Ben Nadel Aug 28, 2007 at 8:38 AM

15,996 Comments

@Matt,

I think Todd is confusing "wrong" with "Brilliant!".

Sweet! I finally did it! "Correct! You must be human." Took me a few tries ;)

Justin Aug 28, 2007 at 8:48 AM

74 Comments

Hey Ben,

What about creating a UUID with every page load that contains a form. Store the UUID in a database and put the UUID in a hidden form field. When the form is submitted check and delete the UUID from the database or ignore the form submission. You could even insert a number of tries based on the page load or even a timeout through a timestamp.

This would only allow spammers to come to your site and manually submit as the html would have to load.

Adam Fairbanks Aug 28, 2007 at 10:53 AM

4 Comments

I find that robust server-side form validation screens out most automated form submits. The bots usually trip on at least one item: valid email address, valid phone, valid zip, maxlength, required field, field type (e.g., numeric, date), etc. So the bot gets an error message from the server-side form validation, and the form is never submitted.

A required blank hidden field sounds like a good, easy, and unobtrusive thing to add. Combined with a required field, I wonder if bots would be smart enough to fill in the required field and not fill in the required blank field.

Ben Nadel Aug 28, 2007 at 6:29 PM

15,996 Comments

@Adam,

Agreed. Since there are so many different browser capabilities out there, server side is really the only cross-browser compliant way to do validation.

Mike Sep 9, 2007 at 5:25 PM

3 Comments

Why not just do something like this?
<cfset userAgent = "#CGI.HTTP_USER_AGENT#">
<cfif #find("Mozilla", userAgent)#>

<cfelse>
<cflocation url="index.cfm?FuseAction=Main&m=0">
<cfabort>
</cfif>

Most bots I have seen don't have the USER_AGENT filled or have junk in it.

Ben Nadel Sep 9, 2007 at 5:34 PM

15,996 Comments

@Mike,

I think that might be a bit too big of an assumption for me.

Mike Sep 9, 2007 at 6:02 PM

3 Comments

Ben,

Understood, but from my sites and using this has stopped about 99% of the bots from hitting my forums.

Mike

Ben Nadel Sep 9, 2007 at 6:05 PM

15,996 Comments

@Mike,

I will look into this. My only concern is that I that I also know there are some users who's firewalls will strip out all the CGI information from a request. So much of the trouble with FORM SPAM is not necessarily blocking the bots - it's trying NOT to block real life users.

Mike Sep 10, 2007 at 12:05 AM

3 Comments

Interesting, I have not run into that issue yet.

Frank Marion Mar 6, 2008 at 8:25 PM

2 Comments

I have a very simple technique that works. Requires no javascript, or complicated server weirdness, is fully accessible and has low overhead. It does however require that the user fill out one field with a number, so it's good for anti-bot but not for humans. In three years, I have yet to receive *any* spam on any of the many sites that I have installed this on. Sites that were being bombarded with hundreds of spams daily suddenly became quiet and good emails get though. It takes moments to install, has negligible overhead.

Here is the barebones version.

<cfparam name="session.chk_rand" default="#NumberFormat(RandRange(0, 9999),'0000')#">

#session.chk_rand# enter this number here -></string><input name="spmchck" type="text" size="4" maxlength="4"/>

<cfparam name="form.spmchck" default="">
<cfif form.spmchck NEQ session.chk_rand>
<cfoutput>Some human readable validation message "Sorry, you need to fill in the following fields..."</cfoutput>
<cfset x = StructDelete(Session, "chk_rand")>
<cfabort>
</cfif>

All you are doing is this:

Generating a random number
Set it to a session variable
Display the number next to a field in the form, get the user to copy it over.
Check that the form.value and that the session.value are equal to each other.

If the two numbers equal each other, then it passes, if not, the session value with the random number is deleted (and thus the next attempt gives you a new random number), then the template is aborted. One could, if one were so inclined add logging and perhaps even a honey pot link or email. One could conceivably extend this to use alpha characters, or even behind the scenes arithmetic and so on.

A bot won't know it, and a human has only a very simple task to perform. And it changes every time the form is accessed, so even if they do it manually, it's labour intensive. Those with javascript enabled will get a lovely little message, and never have to see the server side validation. You can be more sophisticated about it, but this is basically the same notion of captcha.

As simple as it gets.

Frank Marion Mar 6, 2008 at 8:29 PM

2 Comments

Something odd happened to the code above. Reposted here with no bold tags

<cfparam name="session.chk_rand" default="#NumberFormat(RandRange(0, 9999),'0000')#">

#session.chk_rand# enter this number here -><input name="spmchck" type="text" size="4" maxlength="4"/>