Creating Tiny URLs Using ColdFusion
After building my custom ColdFusion radix conversion methods yesterday, I decided to give creating tiny URLs with ColdFusion a shot. Just as with most of the tiny URL services you see on the internet, the demonstration below either automatically generates a tiny URL for you, or it lets you specify a custom short URL (which it will use if the suggested URL has not already been taken). Then, of course, clicking on the short URL tracks click counts and forwards the user to the target URL (forwarding is not done in my demonstration).
As you will see in the code below, the short URL that gets stored in the database is actually a primary key on the data table. This is a very important point because it means that the database will throw a primary key violation if we try to insert a duplicate short URL. Because of this, we are able to offload some of the heavy lifting to the database itself.
To understand the beauty of this, let's think about how we might ordinarily handle this kind of a situation. Traditionally, when worrying about inserting unique values into a datatable, we might do the following:
- Query database for value.
- Check to see if query returned a value.
- If no record returned, insert new value.
Of course, in this model, we run a large risk of having a parallel thread insert our target value just after we tried to read it; then, we go to write something that was just written (creating dirty data). To get around this, however, we might serialize the database read/writes:
- Lock datatable for serialized access.
- Check to see if query returned a value.
- If no record returned, insert new value.
- Release lock on datatable.
This works, but now we have ColdFusion handling some threading that one might argue is really best left to the database (as the database is more highly optimized to handle conflicts). Also, this assumes that every write to the database will cause a conflict, which it will not. In cases where no conflict will exist, we are single threading for no reason. And, I believe it also locks any unrelated reads to the same datatable (but I am not well educated at this level). I am sure there are transaction settings that you can use to tweak this on the ColdFusion side, but ultimately, I think by leveraging the database's own conflict resolution, we can speed up the process for every process involved.
As such, you will see that the following code actually leverages and embraces primary key errors thrown by the database. And, in doing so, you will see that in a best case scenario, only one database call will need to be made to execute the given request.
And so, without further delay, my Tiny URL ColdFusion demo:
<!--- Include utility functions. --->
<cfinclude template="_functions.cfm" />
<!--- Param form variables. --->
<cfparam name="FORM.url" type="string" default="" />
<cfparam name="FORM.short_url" type="string" default="" />
<cfparam name="FORM.submitted" type="boolean" default="false" />
<!--- Create an array of errors. --->
<cfset arrErrors = [] />
<!---
Create a variable to hold the generated url. This will
only be populated if the form gets processed.
--->
<cfset strShortURL = "" />
<!---
Get our collection of available characters that can be used
when creating character-sensitive tiny URLs.
--->
<cfset arrCharacters = ListToArray(
"a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 3 4 5 6 7 8 9",
" "
) />
<!--- Check to see if the form was submitted. --->
<cfif FORM.submitted>
<!--- Validate our form data. --->
<cfif NOT Len( FORM.url )>
<cfset ArrayAppend(
arrErrors,
"Please enter your target URL to shorten."
) />
</cfif>
<!---
Check to see if our suggested short url has any
characters that we don't want to handle.
--->
<cfif REFind( "[^\w\-_]", FORM.short_url )>
<cfset ArrayAppend(
arrErrors,
"Your tiny url can only contain letters, numbers, dashes, and underscores."
) />
</cfif>
<!---
Check to see if we have any errors so far. If not,
then we have enough data to process.
--->
<cfif NOT ArrayLen( arrErrors )>
<!---
Now that we are going to create the new URL, we
need to see if they suggested a tiny URL. If so,
we are going to TRY to use that.
--->
<cfif Len( FORM.short_url )>
<!---
Try to insert the short url. If the short URL is
already taken, then the database will throw an
exception (its a primary key).
--->
<cftry>
<!--- Insert the new short url. --->
<cfquery name="qInsert" datasource="ben">
INSERT INTO short_url
(
is_custom,
short_url,
numeric_value,
target_url,
click_count,
date_last_clicked,
date_created
) VALUES (
<cfqueryparam value="1" cfsqltype="cf_sql_tinyint" />, <!--- is_custom. --->
<cfqueryparam value="#FORM.short_url#" cfsqltype="cf_sql_varchar" />, <!--- short_url. --->
<cfqueryparam value="0" cfsqltype="cf_sql_integer" />, <!--- numeric_value. --->
<cfqueryparam value="#FORM.url#" cfsqltype="cf_sql_varchar" />, <!--- target_url. --->
<cfqueryparam value="0" cfsqltype="cf_sql_integer" />, <!--- click_count. --->
<cfqueryparam value="" cfsqltype="cf_sql_timestamp" null="true" />, <!--- date_last_clicked. --->
<cfqueryparam value="#Now()#" cfsqltype="cf_sql_timestamp" /> <!--- date_created. --->
);
</cfquery>
<!---
If we have gotten this far, then we have
successfully inserted the new short URL.
Store the code that was used.
--->
<cfset strShortURL = FORM.short_url />
<!--- Catch primary key exception. --->
<cfcatch>
<cfset ArrayAppend(
arrErrors,
"The suggested tiny url label is already in use."
) />
</cfcatch>
</cftry>
<cfelse>
<!---
Now short URL was suggested, so, we need to
generate one ourselves. To do so, we are going to
grab the largest non-custom value and then keep
incrementing it until we insert one successfully.
--->
<!--- Grab the largest custom value. --->
<cfquery name="qMaxCustom" datasource="ben">
SELECT
MAX( numeric_value ) AS numeric_value
FROM
short_url
</cfquery>
<!---
Get our base numeric value by adding one to the
max numberic value.
--->
<cfset intNextValue = (Val( qMaxCustom.numeric_value ) + 1) />
<!---
Now that we have this value, we are going to
create a short URL based on our custom-radix
encoding of this unique value. Because someone
else might be submitting this form at the same
time, we are going to keep looping until the
database stops throwing primary key errors (on
the short_url).
--->
<cfloop condition="true">
<!---
Get the short-url encoding of our next
numeric value.
--->
<cfset strShortURL = FormatBaseNData(
intNextValue,
arrCharacters
) />
<!---
Try to insert the new value. If this short URL
is already taken, database will throw an error
(primary key violation).
--->
<cftry>
<!--- Insert the new short url. --->
<cfquery name="qInsert" datasource="ben">
INSERT INTO short_url
(
is_custom,
short_url,
numeric_value,
target_url,
click_count,
date_last_clicked,
date_created
) VALUES (
<cfqueryparam value="0" cfsqltype="cf_sql_tinyint" />, <!--- is_custom. --->
<cfqueryparam value="#strShortURL#" cfsqltype="cf_sql_varchar" />, <!--- short_url. --->
<cfqueryparam value="#intNextValue#" cfsqltype="cf_sql_integer" />, <!--- numeric_value. --->
<cfqueryparam value="#FORM.url#" cfsqltype="cf_sql_varchar" />, <!--- target_url. --->
<cfqueryparam value="0" cfsqltype="cf_sql_integer" />, <!--- click_count. --->
<cfqueryparam value="" cfsqltype="cf_sql_timestamp" null="true" />, <!--- date_last_clicked. --->
<cfqueryparam value="#Now()#" cfsqltype="cf_sql_timestamp" /> <!--- date_created. --->
);
</cfquery>
<!---
If we have gotten this far, then we have
successfully inserted the new short URL.
Break out of loop and proceed with the
selected short url.
--->
<cfbreak />
<!--- Catch primary key exception. --->
<cfcatch>
<!---
Short url was already taken. Increment
the next value index and try again.
--->
<cfset intNextValue++ />
</cfcatch>
</cftry>
</cfloop>
</cfif>
</cfif>
</cfif>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>ColdFusion Tiny URL Demo</title>
</head>
<body>
<h1>
ColdFusion Tiny URL Demo
</h1>
<cfoutput>
<!--- Check to see if a short URL was generated. --->
<cfif Len( strShortURL )>
<p>
<strong>Generated URL:</strong>
<a
href="./go/?#strShortURL#"
target="shorturl"
>./go/?#strShortURL#</a>
</p>
</cfif>
<!--- Check to see if we have any errors to display. --->
<cfif ArrayLen( arrErrors )>
<p>
<strong>Please review the following:</strong>
</p>
<ul>
<cfloop
index="strError"
array="#arrErrors#">
<li>
#strError#
</li>
</cfloop>
</ul>
</cfif>
<form action="#CGI.script_name#" method="post">
<!--- Submission flag. --->
<input type="hidden" name="submitted" value="true" />
<p>
<label>
Target URL:<br />
<input
type="text"
name="url"
value="#FORM.url#"
size="60"
maxlength="500"
/>
</label>
</p>
<p>
<label>
Tiny URL <em>(optional)</em>:<br />
<input
type="text"
name="short_url"
value="#FORM.short_url#"
size="30"
maxlength="30"
/>
<br />
</label>
</p>
<p>
<input type="submit" value="Get Short URL" />
</p>
</form>
</cfoutput>
</body>
</html>
At the top of the code, you will see that I am including a _functions.cfm template. This is where I am including the FormatBaseNData() ColdFusion UDF that I wrote yesterday. Since I am only doing a proof of concept, my short URL is being tested as a query string rather than a 404 / URL rewrite handler. Here is the local page that handles the short URL if you click on the generated link (no error checking has been added to this portion of the demo):
<!--- Query for the tiny url. --->
<cfquery name="qShortURL" datasource="ben">
<!--- Increment the click count. --->
UPDATE
short_url
SET
click_count = (click_count + 1),
date_last_clicked = <cfqueryparam value="#Now()#" cfsqltype="cf_sql_timestamp" />
WHERE
short_url = <cfqueryparam value="#CGI.query_string#" cfsqltype="cf_sql_varchar" />
;
<!--- Gath short URL data. --->
SELECT
u.short_url,
u.target_url,
u.click_count
FROM
short_url u
WHERE
short_url = <cfqueryparam value="#CGI.query_string#" cfsqltype="cf_sql_varchar" />
;
</cfquery>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>ColdFusion Tiny URL Demo</title>
</head>
<body>
<h1>
ColdFusion Tiny URL Demo
</h1>
<cfoutput>
<p>
<strong>Short URL:</strong>
#qShortURL.short_url#
</p>
<p>
<strong>Target URL:</strong>
#qShortURL.target_url#
</p>
<p>
<strong>Click Count:</strong>
#qShortURL.click_count#
</p>
</cfoutput>
</body>
</html>
So there you have it - my first attempt to create a tiny URL system in ColdFusion. As a final note, I just wanted to point out because I am using both lowercase and uppercase values in my short URLs, it is important that my database table treat the short_url column as case sensitive. In MySQL, in order to do that, I had to change the COLLATION of the short_url field to be (UTF_bin). This treats the string as a binary value which MySQL will always compare using case sensitive operations.
Want to use code from this post? Check out the license.
Reader Comments
Nice idea with the primary key exception! Still, the larger the table the slower this approach gets. A table with a thousand entries would throw 1000 exceptions in a row before being able to insert a new value, right?
So some kind of hashing with few collisions would be advisable, that's what all these tinyurl services seem to do?
@Martin,
Before it starts to insert primary key values, the code gets the largest numeric value (for non-custom values). At that point, it starts to try inserting, incrementing the value each time it fails. So, in a best case scenario, it would be:
1. Get highest value.
2. Increment value.
3. Insert value.
If there were conflicts being created, the page flow would be:
1. Get highest value.
2. Increment value.
3. Insert value.
4. Catch exception (goto #2)
although this approach is good. i think a better approach would be to use to use an algorithm to generate the tinyurls instead of letting the user enter them.
you could generate a SHA1 and use the first 7 or so character from it (similar to what you can do in git to access commits) an use that as the url ids.
@Tony,
I think you can alter the code to use a different auto-generation algorithm AND still allow the user to enter their own. I don't know if you watched the video, but the ability to enter your own tiny URL is an optional feature; if left blank, the system will auto-generate one.
Ah, I see! Amazing :)
@Martin,
I am not sure if it's the best way to handle the primary key conflicts - but, there is something that feels really elegant about it.
These last two posts have been directly in line with some work I've been doing with generating keys to make urls simpler.
The url shortener is on docket for later this week, but I'll be keeping it simple with just 36 radix.
I do random generation so people can't just step up the sequence.
It's good to see your work for similar solutions.
@Greg,
Awesome. When you post your thoughts, be sure to drop a link in here so we can take a look.
I built something like this last year at work and used a slick trick a colleague of mine showed me: autoincrement your url's id column of regular old type integer, but then lookup or output on the fly in base 36, using formatBaseN and inputBaseN. Clever eh?
@Jason,
This uses a similar technique. However, since people can suggest their own small url value, I didn't want to use the auto-increment column as then I would be skipping over available incremental values. As such, I do keep a secondary "numeric_value" column in which I am doing a BaseN custom radix conversion.
Ben, you should do more video's! or even a kinky podcast! Would be a great addition to the blog
@Will,
Thank man. I will try to put more videos up here.
How about the EXISTS test to cut down on the overhead of leaving the DB engine to determine the PK clash (referring to Martin's first comment)? Not sure if it's more performant over large sets or not, but something like this would still generate an error for the CFTRY block with more control over the DB handling:
IF NOT EXISTS (
SELECT short_url
FROM short_url
WHERE short_url = <cfqueryparam value="#strShortURL#" cfsqltype="cf_sql_varchar" />
)
BEGIN
INSERT INTO short_url
(
...
) VALUES (
...
)
END
ELSE
BEGIN
RAISERROR('The suggested tiny url label is already in use.', 20, 1)
END
@Jason,
That might work for the case where the user suggests a short url. But, is manually raising an error in the SQL really any different from letting the database raise an PKEY duplication error?
Yep, that's the question. As I said, not sure it's an improvement, but I do think Martin was on to something with his concern that waiting for the DB to bubble through all possible PK values before throwing may become a performance issue as the table size grows. It is also possible, however, that EXISTS also takes just as much overhead ... I'll give this a test if I get a few minutes today.
@Jason & @Ben
I could imagine that the database internally holds the max value of each column separately. Thus every DB query of that kind should be processed in a constant amount of time. But I'm interessted in your test results!
Well, testing with 100,000 UUIDs in the PK, it really appears that both approaches (using EXISTS to pre-test vs simply throwing the repeat value in and waiting for the thrown error) consistently take 0 milliseconds on the DB side. If you scale that out to several million maybe it makes a difference, but my guess is that the PK indexing is good enough to make performance a wash in either case.
@Jason,
Thanks for testing that! Good to hear that it's fast either way.
Although this post didn't help me directly, the concept got me started so thanks (as usual) Ben!
I thought I'd share my final function here and get some feedback.
This (obviously) wouldn't check for duplicates but should get someone going.
Finally, I'm doing a simple BINARY call in MySQL to do a case sensitive query to my object table.
@Brian,
Glad to, at the very least, help you get the motor running. I need to look into binary-comparisons in SQL. I always see them used and I can never remember how they work exactly. I know people have explained it to me before, but it's one of those things I never use (and therefore never commit to memory).
Perhaps it's time for a blog post on the matter to drive the point home.
Not sure if you are still looking at this post or not, but here goes.
I want to use this, but need to have the ?shortword
appear just after the www.aiag.org/?shortword.
However, it is not recogizing the /?shortword, it only sees the default file.
Any ideas you could pass back would be helpful.
Ben, This is great. I have a question.
I tried using cflocation to redirect shortened url's but I found a bug.
For some reason when you use cflocation, cookies on destination links don't get set. It's like the browser completely bypasses it.
Is there a workaround to ensure that the shortened link passes the destination link through the header so the browser picks it up?