Tracking Request Volume Based On IP Addresses In ColdFusion
This much less a "how to" post and much more just a "thinking out loud" post. I don't really know if I even like what I came up with; but, I figured I would put it out here in case it lead to some good conversations. In the wake of some spam comments on my site, I started to think about tracking the IP addresses of requests in ColdFusion in order to see if a given IP address was making too many requests in too short a time period.
This kind of task actually has some very interesting problems to solve:
- We want to track request volume by IP address.
- We only want to track volume over a given period of time.
- We want to efficiently dereference old tracking information.
- We want to make this all thread-safe, but fast.
To start with, let me just say I ignored thread-safety for the time being; I figured that could be re-addressed later on (this was more an exploration). What was more interesting to me was the idea of tracking volume for a given time period and efficiently de-referencing old tracking data. Because we only want to look at request volume for a given time period, we can't simply keep a running sum of requests for each IP address; doing that would prevent us from constantly moving the viewing window forward without corrupting the existing aggregates. We also don't want to have to loop over hundreds (or even thousands) of items looking for data as this would probably not be performant.
To account for a valid time period and efficient de-referencing, I came up with the idea of using time offset buckets. By grouping IP tracking into buckets, we could store IP-based aggregates while still being able to dump old tracking information without corruption. Also, by storing time offset data in the bucket itself, rather than with the IP value, we don't have to loop over IP values - of which there could be thousands - looking for outdated tracking information.
With this approach, IP keys will get duplicated in each bucket (in which they made a request); but, by using this approach, the most processing we'll ever have to do is looping over the buckets, of which there will be a small and constant number. As such, the only scaling issues here would be RAM availability and the speed of hash table lookups; meaning, the number of requests shouldn't have too much of an impact on how well this runs.
Before we look at the ColdFusion component that contains this business logic, let's take a quick look at the calling code (simply an API demo, not a real-world use case):
<!--- Create the IP tracker. --->
<cfset tracker = createObject( "component", "IPTracker" ).init() />
<!--- Track several IP requests. --->
#tracker.trackIP( cgi.remote_addr )#<br />
#tracker.trackIP( cgi.remote_addr )#<br />
#tracker.trackIP( cgi.remote_addr )#<br />
#tracker.trackIP( cgi.remote_addr )#<br />
#tracker.trackIP( cgi.remote_addr )#<br />
#tracker.trackIP( cgi.remote_addr )#<br />
<!--- Pause the thread (to simulate over-time requests). --->
<cfthread
action="sleep"
duration="#(1000 * 5)#"
/>
<!--- Track several IP requests. --->
#tracker.trackIP( cgi.remote_addr )#<br />
#tracker.trackIP( cgi.remote_addr )#<br />
<br />
<!--- Output the bucket data. --->
<cfdump
var="#tracker.getBuckets()#"
label="Buckets Data"
/>
Here, we are creating the tracker and then logging many requests to it. The TrackIP() method tracks the given IP address and then returns a True / False value as to whether or not the given IP has reached the request volumne threshold. Running the above code, we get the following output:
As you can see, the IP tracking, even over the few seconds in this demo, have been split up into two buckets, each containing a sub-aggregate for each IP address for that time period.
Now that you kind of see what's going on, let's take a look at the IPTracker.cfc ColdFusion component:
IPTracker.cfc
<cfcomponent
output="false"
hint="I track IP addresses over time for abuse protection.">
<cffunction
name="init"
access="public"
returntype="any"
output="false"
hint="I initialize this component.">
<!---
I contains a collection of all the time frame
buckets that are being accounted for. Each bucket is
a hash of all the IP addresses that were tracked in
the time frame represented by each bucket.
--->
<cfset variables.buckets = [] />
<!---
I am the time span in seconds of each bucket. The
smaller the bucket duration, the more accurate, but
the more memory / processing required to keep track.
--->
<cfset variables.bucketDuration = 3 />
<!---
I am the time span in seconds of valid tracking.
Meaning, I am the amount of time a request from a
given IP address will be kept.
--->
<cfset variables.trackingDuration = 60 />
<!---
I am the threshold sum of all the requests made by a
given IP address within the given valid duration. Once
this threshold has been reached, requests made by a
given IP address will be considered invalid.
--->
<cfset variables.validThreshold = 5 />
<!---
I am the number of buckets we need to keep based on
the time spans above.
--->
<cfset variables.bucketCount = ceiling(
variables.trackingDuration / variables.bucketDuration
) />
<!---
I am the starting point in time for the tracking.
This will be used to figure out the bucket keys.
--->
<cfset variables.startTime = now() />
<!--- Return this component reference. --->
<cfreturn this />
</cffunction>
<cffunction
name="getBucketKey"
access="public"
returntype="numeric"
output="false"
hint="I return the bucket key based on the given time.">
<!--- Define arguments. --->
<cfargument
name="timeStamp"
type="date"
required="true"
hint="I am the time being translated into a bucket key."
/>
<!--- Define the local scope. --->
<cfset var local = {} />
<!---
Get the number of seconds that have passed since
the tracking started.
--->
<cfset local.offset = dateDiff(
"s",
variables.startTime,
arguments.timeStamp
) />
<!--- Create the bucket key. --->
<cfset local.bucketKey = (
local.offset -
(local.offset mod variables.bucketDuration)
) />
<!---
Check to see if the difference since tracking was
started is more than a day. If so, then reset it
(simply to keep the number from getting too large).
NOTE: The key itself is not really that critical.
--->
<cfif (local.offset gt 86400)>
<!--- Reset the start time for next run. --->
<cfset variables.startTime = now() />
</cfif>
<!--- Return the bucket key. --->
<cfreturn local.bucketKey />
</cffunction>
<cffunction
name="getBuckets"
access="public"
returntype="array"
output="false"
hint="I return the buckets.">
<cfreturn variables.buckets />
</cffunction>
<cffunction
name="hasIPReachedThreshold"
access="public"
returntype="boolean"
output="false"
hint="I determine if the given IP address has reached the request threshold.">
<!--- Define arguments. --->
<cfargument
name="ip"
type="string"
required="true"
hint="I am the IP address being tracked."
/>
<!--- Define the local scope. --->
<cfset var local = {} />
<!---
We need to sum the hit count for this IP address
within the tracked time frame.
--->
<cfset local.hitCount = 0 />
<!--- Loop over the buckets to sum hits. --->
<cfloop
index="local.bucket"
array="#variables.buckets#">
<!---
Check to see if the IP exists in this bucket
(which would indicate a hit count).
--->
<cfif structKeyExists( local.bucket.ips, arguments.ip )>
<!--- Add the count to the running total. --->
<cfset local.hitCount += local.bucket.ips[ arguments.ip ] />
</cfif>
</cfloop>
<!---
Now that we have summed the hit count for the given
IP address with the given time tracking, return
whether or not the total has reached the threshold.
--->
<cfreturn (local.hitCount gt variables.validThreshold) />
</cffunction>
<cffunction
name="paramFirstBucket"
access="public"
returntype="void"
output="false"
hint="I param the first bucket.">
<!--- Define the local scope. --->
<cfset var local = {} />
<!--- Get the bucket key for this time. --->
<cfset local.bucketKey = this.getBucketKey( now() ) />
<!---
Check to see if we need to create a new bucket for the
given key. New buckets will always be prepended to the
buckets collection, so all we need to do is check for
the first bucket.
--->
<cfif (
!arrayLen( variables.buckets ) ||
(variables.buckets[ 1 ].key neq local.bucketKey)
)>
<!---
Create a new bucket to prepent. Each bucket needs
a key as well as a hash of the IPs being tracked.
--->
<cfset local.bucket = {
key = local.bucketKey,
ips = {}
} />
<!--- Prepend the bucket. --->
<cfset arrayPrepend( variables.buckets, local.bucket ) />
<!---
Now that we have augmented the bucket collection,
check to see if we now have more buckets than we
want to be tracking.
--->
<cfif (arrayLen( variables.buckets ) gt variables.bucketCount)>
<!---
The buckets at the end are now beyond the time
span of our valid IP tracking. Delete them.
--->
<cfloop
index="local.bucketIndex"
from="#arrayLen( variables.buckets )#"
to="#(variables.bucketCount + 1)#"
step="-1">
<!--- Delete last bucket. --->
<cfset arrayDeleteAt(
variables.buckets,
local.bucketIndex
) />
</cfloop>
</cfif>
</cfif>
<!--- Return out. --->
<cfreturn />
</cffunction>
<cffunction
name="trackIP"
access="public"
returntype="boolean"
output="false"
hint="I track the given IP address and return a boolean as to whether the given IP address has exceeded the threshold.">
<!--- Define arguments. --->
<cfargument
name="ip"
type="string"
required="true"
hint="I am the IP address being tracked."
/>
<!--- Define the local scope. --->
<cfset var local = {} />
<!---
Param the first bucket (as this is the one in which we
will be taking immediate action).
--->
<cfset this.paramFirstBucket() />
<!---
ASSERT: At this point, we know that we have a first
bucket and that it is the bucket we want to take
direct action on.
--->
<!--- Param the IP address in the bucket. --->
<cfparam
name="variables.buckets[ 1 ].ips[ arguments.ip ]"
type="numeric"
default="0"
/>
<!--- Increment the IP hit count in this bucket. --->
<cfset variables.buckets[ 1 ].ips[ arguments.ip ]++ />
<!---
Now that we have updated the buckets, return whether
or not the IP address has reached the threshold.
--->
<cfreturn this.hasIPReachedThreshold( arguments.ip ) />
</cffunction>
</cfcomponent>
So anyway, that's what I've got. Again, this was not so much a solution as it was just (hopefully) the basis of some good conversation. I understand that much more thread safety thinking has to be put into place; I left that out of this in order to keep things simple.
Want to use code from this post? Check out the license.
Reader Comments
You might check out Open BlueDragon's CFTHROTTLE tag for some additonal implementation ideas:
http://wiki.openbluedragon.org/wiki/index.php/CFTHROTTLE
@Matt,
Looks interesting - definitely the same concept (tracking hits per IP per time period). Is there a way to look at the source code they have? Or is that all in Java classes?
Yep, that's what open source is all about. ;-) Since this is native to the engine it's in Java of course, but if you grab the source and navigate to com/naryx/tagfusion/cfm/tag/ext/cfTHROTTLE.java you can check things out.
I'm curious.... how many requests are too many?
Hey Ben,
Yeah good post, I had hit into the same problem and this is a code sample of what I have been using successfully to handle issue for quite some tome now with no issue.
Note this is a somewhat hacked except of my request facade object (so some of the generic methods are inherited but not shown, aswell as some other code being called, but only to get some mentioned config vars), but it at least gives you an idea.
This works via a struct (fast) keyed on ip address, then an array of times per key. It is thread safe also.
Let me know if you have any questions and keep up the great blog :)
<!---
values currently returned from getEngine().getGeneral()
requestsResetTime = 24 <!-- hours -->
requestsTimeframe = 10 <!-- seconds -->
requestsInTimeframeByIpMax = 50
Other code is removed to make this easier to follow
--->
<cfcomponent
displayname="RequestFacade"
extends="application.model.core.Facade"
output="false"
hint="RequestFacade creates an interface to store request based code">
<!---
PROPERTIES
--->
<!---
INITIALIZATION / CONFIGURATION
--->
<cffunction name="init" access="public" output="false" returntype="application.model.core.facade.RequestFacade"
hint="Constructor for this CFC.">
<cfargument name="engine" type="application.model.core.Engine" required="true" />
<cfset super.init(arguments.engine) />
<cfset resetRequests() />
<cfreturn this />
</cffunction>
<!---
ACCESSORS
--->
<cffunction name="getRequests" access="public" returntype="struct" output="false">
<cflock type="readonly" name="#getUniqueLockId('requests')#" timeout="10" throwontimeout="true">
<cfreturn variables.instance.requests />
</cflock>
</cffunction>
<cffunction name="setRequests" access="private" returntype="void" output="false">
<cfargument name="requests" type="struct" required="true" />
<cflock type="exclusive" name="#getUniqueLockId('requests')#" timeout="10" throwontimeout="true">
<cfset variables.instance.requests = arguments.requests />
</cflock>
</cffunction>
<cffunction name="getRequestsStart" access="public" returntype="date" output="false">
<cflock type="readonly" name="#getUniqueLockId('requestsStart')#" timeout="10" throwontimeout="true">
<cfreturn variables.instance.requestsStart />
</cflock>
</cffunction>
<cffunction name="setRequestsStart" access="private" returntype="void" output="false">
<cfargument name="requestsStart" type="date" required="true" />
<cflock type="exclusive" name="#getUniqueLockId('requestsStart')#" timeout="10" throwontimeout="true">
<cfset variables.instance.requestsStart = arguments.requestsStart />
</cflock>
</cffunction>
<!---
PUBLIC FUNCTIONS
--->
<cffunction name="onRequestStart" access="public" output="false" returntype="void"
hint="Should be fired when the request starts from the Application bootstrapper.">
<cfargument name="targetPage" type="string" required="true" />
<!--- run specific actions --->
<cfset purgeRequestsIfNeeded() />
<cfset manageRequests() />
<cfreturn />
</cffunction>
<!---
PROTECTED FUNCTIONS
--->
<cffunction name="resetRequests" access="private" returntype="void" output="false">
<cfset setRequests(structNew()) />
<cfset setRequestsStart(now())>
</cffunction>
<cffunction name="purgeRequestsIfNeeded" access="private" returntype="void" output="false">
<cfif dateDiff("H", getRequestsStart(), now()) gt getEngine().getGeneral().getRequestsResetTime()>
<cfset resetRequests() />
</cfif>
</cffunction>
<cffunction name="getRequestsByIp" access="private" returntype="array" output="false">
<cfargument name="ip" type="string" required="true" />
<cfset var requests = getRequests() />
<cflock type="exclusive" name="#getUniqueLockId('request_#arguments.ip#')#" timeout="10" throwontimeout="true">
<cfif not structKeyExists(requests, arguments.ip)>
<cfset requests[arguments.ip] = arrayNew(1) />
</cfif>
<cfreturn requests[arguments.ip] />
</cflock>
</cffunction>
<cffunction name="getRequestByIp" access="private" returntype="date" output="false">
<cfargument name="ip" type="string" required="true" />
<cfargument name="index" type="numeric" required="true" />
<cfset var requests = getRequests() />
<cflock type="exclusive" name="#getUniqueLockId('request_#arguments.ip#')#" timeout="10" throwontimeout="true">
<cfreturn requests[arguments.ip][arguments.index] />
</cflock>
</cffunction>
<cffunction name="appendRequestByIp" access="private" returntype="void" output="false">
<cfargument name="ip" type="string" required="true" />
<cfargument name="timestamp" type="date" required="false" default="#now()#" />
<cfset var requests = getRequests() />
<cflock type="exclusive" name="#getUniqueLockId('request_#arguments.ip#')#" timeout="10" throwontimeout="true">
<cfset arrayAppend(requests[arguments.ip], arguments.timestamp) />
</cflock>
</cffunction>
<cffunction name="deleteRequestByIp" access="private" returntype="void" output="false">
<cfargument name="ip" type="string" required="true" />
<cfargument name="index" type="numeric" required="true" />
<cfset var requests = getRequests() />
<cflock type="exclusive" name="#getUniqueLockId('request_#arguments.ip#')#" timeout="10" throwontimeout="true">
<cfset arrayDeleteAt(requests[arguments.ip], arguments.index) />
</cflock>
</cffunction>
<cffunction name="getRequestSizeByIp" access="private" returntype="numeric" output="false">
<cfargument name="ip" type="string" required="true" />
<cfreturn arrayLen(getRequestsByIp(arguments.ip)) />
</cffunction>
<cffunction name="manageRequests" access="private" returntype="void" output="false">
<cfset var ip = getEngine().getRuntime().getIpAddress() />
<cfset var requestsTimeframe = getEngine().getGeneral().getRequestsTimeframe() />
<cfset var requests = getRequests() />
<cfset var requestsSize = getRequestSizeByIp(ip) />
<cfset var index = 0 />
<cflock type="exclusive" name="#getUniqueLockId('request_#ip#')#" timeout="10">
<cfloop condition="true">
<cfset index = index + 1>
<cfif index gt requestsSize>
<cfbreak />
</cfif>
<cfif dateDiff("S", getRequestByIp(ip, index), now()) gt requestsTimeFrame>
<cfset deleteRequestByIp(ip, index)>
<cfset index = index - 1>
<cfset requestsSize = requestsSize - 1>
<cfelse>
<cfbreak />
</cfif>
</cfloop>
<cfset appendRequestByIp(ip, now()) />
<cfif requestsSize gt getEngine().getGeneral().getRequestsInTimeframeByIpMax()>
<cfthrow
type="application.model.core.facade.MaximumRequestsByIpExceeded"
message="Your IP address: #ip# has exceeded the maximum number of requests within the allowed time frame." />
</cfif>
</cflock>
</cffunction>
</cfcomponent>
@Shuns,
It looks cool, but I am not seeing where you purge old IP request data. For example, say someone hits your site, and then doesn't come back - where is that IP address eventually purged from your tracking?
Fancy!
Last time I tried blocking spam with IP traffic it was with CF5. I don't remember the exact code I used; I no longer maintain the site or have access to the code.
Tracked IPs were kept in an application scope and dumped every 5 minutes. Each IP had a startTrackingTime and IPHitCount if the IP hit more than a certain frequency it was removed from tracking and placed in a banned list. The frequency was checked by something like <cfif IPHitCount / dateDiff("s", startTrackingTime, now()) gt 5>ban the IP</cfif>
Banned IPs were simply <cfabort>ed.
Hi Ben,
All requests are purged if needed on request start via purgeRequestsIfNeeded()
(Which in turn fires resetRequests() if conditions are met)
Also within manageRequests() which is the overall checking method, induvidual ones are removed if needed via deleteRequestByIp(ip, index)
Does that help?
@Shuns,
Hmmm, I think I am missing something; but no worries - I get the general drift.
@Ben
Well it goes like this: every request fires a method that will purge/reset the whole struct if needed, then after that occurs will check the ip etc, removing it from the struct if it should be ejected. It the records this new request made, then finally checks to see if the ip has reached a limit if so throws an error
HTH
This seems like something some open source message boards would already be doing to address getting spammed. Anyone know of one off hand that may have done this?