Tracking Request Volume Based On IP Addresses In ColdFusion

By Ben Nadel

Published 2009-11-04 in ColdFusion — Comments (11)

This much less a "how to" post and much more just a "thinking out loud" post. I don't really know if I even like what I came up with; but, I figured I would put it out here in case it lead to some good conversations. In the wake of some spam comments on my site, I started to think about tracking the IP addresses of requests in ColdFusion in order to see if a given IP address was making too many requests in too short a time period.

This kind of task actually has some very interesting problems to solve:

We want to track request volume by IP address.
We only want to track volume over a given period of time.
We want to efficiently dereference old tracking information.
We want to make this all thread-safe, but fast.

To start with, let me just say I ignored thread-safety for the time being; I figured that could be re-addressed later on (this was more an exploration). What was more interesting to me was the idea of tracking volume for a given time period and efficiently de-referencing old tracking data. Because we only want to look at request volume for a given time period, we can't simply keep a running sum of requests for each IP address; doing that would prevent us from constantly moving the viewing window forward without corrupting the existing aggregates. We also don't want to have to loop over hundreds (or even thousands) of items looking for data as this would probably not be performant.

To account for a valid time period and efficient de-referencing, I came up with the idea of using time offset buckets. By grouping IP tracking into buckets, we could store IP-based aggregates while still being able to dump old tracking information without corruption. Also, by storing time offset data in the bucket itself, rather than with the IP value, we don't have to loop over IP values - of which there could be thousands - looking for outdated tracking information.

Tracking IP Request Volume In ColdFusion Using Bucket.

With this approach, IP keys will get duplicated in each bucket (in which they made a request); but, by using this approach, the most processing we'll ever have to do is looping over the buckets, of which there will be a small and constant number. As such, the only scaling issues here would be RAM availability and the speed of hash table lookups; meaning, the number of requests shouldn't have too much of an impact on how well this runs.

Before we look at the ColdFusion component that contains this business logic, let's take a quick look at the calling code (simply an API demo, not a real-world use case):

<!--- Create the IP tracker. --->
<cfset tracker = createObject( "component", "IPTracker" ).init() />

<!--- Track several IP requests. --->
#tracker.trackIP( cgi.remote_addr )#<br />
#tracker.trackIP( cgi.remote_addr )#<br />
#tracker.trackIP( cgi.remote_addr )#<br />
#tracker.trackIP( cgi.remote_addr )#<br />
#tracker.trackIP( cgi.remote_addr )#<br />
#tracker.trackIP( cgi.remote_addr )#<br />

<!--- Pause the thread (to simulate over-time requests). --->
<cfthread
	action="sleep"
	duration="#(1000 * 5)#"
	/>

<!--- Track several IP requests. --->
#tracker.trackIP( cgi.remote_addr )#<br />
#tracker.trackIP( cgi.remote_addr )#<br />

<br />

<!--- Output the bucket data. --->
<cfdump
	var="#tracker.getBuckets()#"
	label="Buckets Data"
	/>

Here, we are creating the tracker and then logging many requests to it. The TrackIP() method tracks the given IP address and then returns a True / False value as to whether or not the given IP has reached the request volumne threshold. Running the above code, we get the following output:

Tracking IP Address Request Volume In ColdFusion.

As you can see, the IP tracking, even over the few seconds in this demo, have been split up into two buckets, each containing a sub-aggregate for each IP address for that time period.

Now that you kind of see what's going on, let's take a look at the IPTracker.cfc ColdFusion component:

IPTracker.cfc

<cfcomponent
	output="false"
	hint="I track IP addresses over time for abuse protection.">


	<cffunction
		name="init"
		access="public"
		returntype="any"
		output="false"
		hint="I initialize this component.">

		<!---
			I contains a collection of all the time frame
			buckets that are being accounted for. Each bucket is
			a hash of all the IP addresses that were tracked in
			the time frame represented by each bucket.
		--->
		<cfset variables.buckets = [] />

		<!---
			I am the time span in seconds of each bucket. The
			smaller the bucket duration, the more accurate, but
			the more memory / processing required to keep track.
		--->
		<cfset variables.bucketDuration = 3 />

		<!---
			I am the time span in seconds of valid tracking.
			Meaning, I am the amount of time a request from a
			given IP address will be kept.
		--->
		<cfset variables.trackingDuration = 60 />

		<!---
			I am the threshold sum of all the requests made by a
			given IP address within the given valid duration. Once
			this threshold has been reached, requests made by a
			given IP address will be considered invalid.
		--->
		<cfset variables.validThreshold = 5 />

		<!---
			I am the number of buckets we need to keep based on
			the time spans above.
		--->
		<cfset variables.bucketCount = ceiling(
			variables.trackingDuration / variables.bucketDuration
			) />

		<!---
			I am the starting point in time for the tracking.
			This will be used to figure out the bucket keys.
		--->
		<cfset variables.startTime = now() />

		<!--- Return this component reference. --->
		<cfreturn this />
	</cffunction>


	<cffunction
		name="getBucketKey"
		access="public"
		returntype="numeric"
		output="false"
		hint="I return the bucket key based on the given time.">

		<!--- Define arguments. --->
		<cfargument
			name="timeStamp"
			type="date"
			required="true"
			hint="I am the time being translated into a bucket key."
			/>

		<!--- Define the local scope. --->
		<cfset var local = {} />

		<!---
			Get the number of seconds that have passed since
			the tracking started.
		--->
		<cfset local.offset = dateDiff(
			"s",
			variables.startTime,
			arguments.timeStamp
			) />

		<!--- Create the bucket key. --->
		<cfset local.bucketKey = (
			local.offset -
			(local.offset mod variables.bucketDuration)
			) />

		<!---
			Check to see if the difference since tracking was
			started is more than a day. If so, then reset it
			(simply to keep the number from getting too large).

			NOTE: The key itself is not really that critical.
		--->
		<cfif (local.offset gt 86400)>

			<!--- Reset the start time for next run. --->
			<cfset variables.startTime = now() />

		</cfif>

		<!--- Return the bucket key. --->
		<cfreturn local.bucketKey />
	</cffunction>


	<cffunction
		name="getBuckets"
		access="public"
		returntype="array"
		output="false"
		hint="I return the buckets.">

		<cfreturn variables.buckets />
	</cffunction>


	<cffunction
		name="hasIPReachedThreshold"
		access="public"
		returntype="boolean"
		output="false"
		hint="I determine if the given IP address has reached the request threshold.">

		<!--- Define arguments. --->
		<cfargument
			name="ip"
			type="string"
			required="true"
			hint="I am the IP address being tracked."
			/>

		<!--- Define the local scope. --->
		<cfset var local = {} />

		<!---
			We need to sum the hit count for this IP address
			within the tracked time frame.
		--->
		<cfset local.hitCount = 0 />

		<!--- Loop over the buckets to sum hits. --->
		<cfloop
			index="local.bucket"
			array="#variables.buckets#">

			<!---
				Check to see if the IP exists in this bucket
				(which would indicate a hit count).
			--->
			<cfif structKeyExists( local.bucket.ips, arguments.ip )>

				<!--- Add the count to the running total. --->
				<cfset local.hitCount += local.bucket.ips[ arguments.ip ] />

			</cfif>

		</cfloop>

		<!---
			Now that we have summed the hit count for the given
			IP address with the given time tracking, return
			whether or not the total has reached the threshold.
		--->
		<cfreturn (local.hitCount gt variables.validThreshold) />
	</cffunction>


	<cffunction
		name="paramFirstBucket"
		access="public"
		returntype="void"
		output="false"
		hint="I param the first bucket.">

		<!--- Define the local scope. --->
		<cfset var local = {} />

		<!--- Get the bucket key for this time. --->
		<cfset local.bucketKey = this.getBucketKey( now() ) />

		<!---
			Check to see if we need to create a new bucket for the
			given key. New buckets will always be prepended to the
			buckets collection, so all we need to do is check for
			the first bucket.
		--->
		<cfif (
			!arrayLen( variables.buckets ) ||
			(variables.buckets[ 1 ].key neq local.bucketKey)
			)>

			<!---
				Create a new bucket to prepent. Each bucket needs
				a key as well as a hash of the IPs being tracked.
			--->
			<cfset local.bucket = {
				key = local.bucketKey,
				ips = {}
				} />

			<!--- Prepend the bucket. --->
			<cfset arrayPrepend( variables.buckets, local.bucket ) />

			<!---
				Now that we have augmented the bucket collection,
				check to see if we now have more buckets than we
				want to be tracking.
			--->
			<cfif (arrayLen( variables.buckets ) gt variables.bucketCount)>

				<!---
					The buckets at the end are now beyond the time
					span of our valid IP tracking. Delete them.
				--->
				<cfloop
					index="local.bucketIndex"
					from="#arrayLen( variables.buckets )#"
					to="#(variables.bucketCount + 1)#"
					step="-1">

					<!--- Delete last bucket. --->
					<cfset arrayDeleteAt(
						variables.buckets,
						local.bucketIndex
						) />

				</cfloop>

			</cfif>

		</cfif>

		<!--- Return out. --->
		<cfreturn />
	</cffunction>


	<cffunction
		name="trackIP"
		access="public"
		returntype="boolean"
		output="false"
		hint="I track the given IP address and return a boolean as to whether the given IP address has exceeded the threshold.">

		<!--- Define arguments. --->
		<cfargument
			name="ip"
			type="string"
			required="true"
			hint="I am the IP address being tracked."
			/>

		<!--- Define the local scope. --->
		<cfset var local = {} />

		<!---
			Param the first bucket (as this is the one in which we
			will be taking immediate action).
		--->
		<cfset this.paramFirstBucket() />

		<!---
			ASSERT: At this point, we know that we have a first
			bucket and that it is the bucket we want to take
			direct action on.
		--->

		<!--- Param the IP address in the bucket. --->
		<cfparam
			name="variables.buckets[ 1 ].ips[ arguments.ip ]"
			type="numeric"
			default="0"
			/>

		<!--- Increment the IP hit count in this bucket. --->
		<cfset variables.buckets[ 1 ].ips[ arguments.ip ]++ />

		<!---
			Now that we have updated the buckets, return whether
			or not the IP address has reached the threshold.
		--->
		<cfreturn this.hasIPReachedThreshold( arguments.ip ) />
	</cffunction>

</cfcomponent>

So anyway, that's what I've got. Again, this was not so much a solution as it was just (hopefully) the basis of some good conversation. I understand that much more thread safety thinking has to be put into place; I left that out of this in order to keep things simple.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/1746

Reader Comments

Matt Woodward Nov 4, 2009 at 3:12 PM

12 Comments

You might check out Open BlueDragon's CFTHROTTLE tag for some additonal implementation ideas:
http://wiki.openbluedragon.org/wiki/index.php/CFTHROTTLE

Ben Nadel Nov 4, 2009 at 3:28 PM

15,996 Comments

@Matt,

Looks interesting - definitely the same concept (tracking hits per IP per time period). Is there a way to look at the source code they have? Or is that all in Java classes?

Matt Woodward Nov 4, 2009 at 4:01 PM

12 Comments

Yep, that's what open source is all about. ;-) Since this is native to the engine it's in Java of course, but if you grab the source and navigate to com/naryx/tagfusion/cfm/tag/ext/cfTHROTTLE.java you can check things out.

Allen Nov 4, 2009 at 4:18 PM

26 Comments

I'm curious.... how many requests are too many?

shuns Nov 4, 2009 at 11:58 PM

76 Comments

Hey Ben,

Yeah good post, I had hit into the same problem and this is a code sample of what I have been using successfully to handle issue for quite some tome now with no issue.

Note this is a somewhat hacked except of my request facade object (so some of the generic methods are inherited but not shown, aswell as some other code being called, but only to get some mentioned config vars), but it at least gives you an idea.

This works via a struct (fast) keyed on ip address, then an array of times per key. It is thread safe also.

Let me know if you have any questions and keep up the great blog :)

<!---

values currently returned from getEngine().getGeneral()

requestsResetTime = 24 
requestsTimeframe = 10 
requestsInTimeframeByIpMax = 50

Other code is removed to make this easier to follow

--->

<cffunction name="getRequests" access="public" returntype="struct" output="false">
<cflock type="readonly" name="#getUniqueLockId('requests')#" timeout="10" throwontimeout="true">
<cfreturn variables.instance.requests />
</cflock>
</cffunction>
<cffunction name="setRequests" access="private" returntype="void" output="false">
<cfargument name="requests" type="struct" required="true" />

<cflock type="exclusive" name="#getUniqueLockId('requests')#" timeout="10" throwontimeout="true">
<cfset variables.instance.requests = arguments.requests />
</cflock>
</cffunction>

<cffunction name="onRequestStart" access="public" output="false" returntype="void"
hint="Should be fired when the request starts from the Application bootstrapper.">
<cfargument name="targetPage" type="string" required="true" />

<cfset purgeRequestsIfNeeded() />
<cfset manageRequests() />

Ben Nadel Nov 5, 2009 at 8:23 AM

15,996 Comments

@Shuns,

It looks cool, but I am not seeing where you purge old IP request data. For example, say someone hits your site, and then doesn't come back - where is that IP address eventually purged from your tracking?

Travis Nov 5, 2009 at 12:49 PM

3 Comments

Fancy!

Last time I tried blocking spam with IP traffic it was with CF5. I don't remember the exact code I used; I no longer maintain the site or have access to the code.

Tracked IPs were kept in an application scope and dumped every 5 minutes. Each IP had a startTrackingTime and IPHitCount if the IP hit more than a certain frequency it was removed from tracking and placed in a banned list. The frequency was checked by something like <cfif IPHitCount / dateDiff("s", startTrackingTime, now()) gt 5>ban the IP</cfif>

Banned IPs were simply <cfabort>ed.

shuns Nov 5, 2009 at 4:39 PM

76 Comments

Hi Ben,

All requests are purged if needed on request start via purgeRequestsIfNeeded()
(Which in turn fires resetRequests() if conditions are met)

Also within manageRequests() which is the overall checking method, induvidual ones are removed if needed via deleteRequestByIp(ip, index)

Does that help?

Ben Nadel Nov 5, 2009 at 7:14 PM

15,996 Comments

@Shuns,

Hmmm, I think I am missing something; but no worries - I get the general drift.

shuns Nov 5, 2009 at 7:31 PM

76 Comments

@Ben

Well it goes like this: every request fires a method that will purge/reset the whole struct if needed, then after that occurs will check the ip etc, removing it from the struct if it should be ejected. It the records this new request made, then finally checks to see if the ip has reached a limit if so throws an error

HTH

Allen Nov 9, 2009 at 1:06 PM

26 Comments

This seems like something some open source message boards would already be doing to address getting spammed. Anyone know of one off hand that may have done this?

Oh my chickens, this post is old!

Hit me up on Twitter if you want to discuss it further.