Downloading Email Attachments With CFPop And CFThread
After being inspired by some stuff that Simon Free did, I decided to get back to playing around with ColdFusion's CFPOP tag. I have experimented with reading emails using CFPop, but this didn't take into account any attachments. So, I thought it would be a good next activity to test the use of ColdFusion and CFPop to download email attachments. Downloading email attachments with CFPop and ColdFusion seems very simple on the surface; the mechanisms for performing the email reads and file downloads are quite simple. The devil, however, is very much in the details, especially when you try to rub a little optimization on the problem area.
CFPop is slow. We are downloading files from a third party email server to our local ColdFusion server. If you have one megabyte worth of file attachments, that's one megabyte that needs to be transferred while the CFPop tag is executing. And that's just one email - if you have many emails, each of which has attachments, you can begin to see how this could be an extremely slow process. To deal with this, and to make sure that the main page does not timeout while waiting for our POP commands to execute, we can wrap each CFPop tag in its own CFThread tag such that it can execute asynchronously to the main page.
Using ColdFusion's CFThread tag, while an amazing tool, does introduces some more problems. Well, not so much problems as just things to be aware of. For starters, you have to be careful about making parallel CFPop requests. I don't know if this is true for all POP accounts, but the one I use only allows one CFPop tag to connect to it at a time. If you have multiple CFPop tags attempting to interact with my account, such as with two parallel CFThreads, it will throw this error:
[IN-USE] This account is being used by another session. Please try again in a few minutes.
To get around this, we can wrap our CFPop tags in exclusive, Named CFLock tags. This way, even if two threads are executing in parallell, they will still have to wait for each of the previous CFPop requests to finish executing. I know this might sound like it is negating the use of the CFThread tag, but remember these are still running asynchronously to our primary thread (original page request) and therefore, we are still reaping the benefits.
Adding the CFThread tag also introduces another issue - context. As we loop over the emails that we want to pull down, we are using the UID value of the current query loop iteration (see example below). The problem is that inside of the CFThread tag body, there is no query context - remember, the CFThread is running in parallell to the main page request. Therefore, we have to pass in the UID value to the CFThread tag and access is via the ATTRIBUTES scope of the thread.
That's a lot to keep track of. And, considering that the CFThread tag dies silently, if you forget one of the above caveats, it can be a pain in the butt to debug. That being said, let's take a look at the demo code that pulls these ideas together:
<!--- Set up attribute collection for CFPop tag. --->
<cfset CFPopAttributes = {
server = REQUEST.Pop.Server,
port = REQUEST.Pop.Port,
username = REQUEST.Pop.Username,
password = REQUEST.Pop.Password
} />
<!---
Because POP accounts can be tricky when it comes to
access, let's put an exclusive, named lock around all
of our POP activities. For the Name, we are going to
use the account name, since this code might be hit
by multiple accounts.
--->
<cflock
name="cfpop-bennadel"
type="exclusive"
timeout="40">
<!--- Gather the email headers. --->
<cfpop
action="getheaderonly"
name="qHeader"
attributecollection="#CFPopAttributes#"
/>
<!---
Get all the emails that contain the POST command
in the subject. Remember, since ColdFusion is not
case sensitive, but Query of Queries are, we need
to lowercase the compare.
--->
<cfquery name="qPostEmail" dbtype="query">
SELECT
subject,
uid,
[date]
FROM
qHeader
WHERE
<!--- QoQ is case sensitive. --->
LOWER( subject ) = 'post'
ORDER BY
[date] ASC
</cfquery>
</cflock>
<!--- Loop over post emails. --->
<cfloop query="qPostEmail">
<!---
Since dealing with CFPOP can be a very slow process,
especially when downloading attachments, let's wrap each
of the subsequent downloads in a CFThread tag. This will
allow them to process asyncronously and our page will
not have to worry much about timing out.
NOTE: Since the threads fire off aysyncronously, the
QUERY LOOP we are current in will have NO CONTEXTUAL
VALUE!!! This means, we have to pass in the uid to the
thread context.
--->
<cfthread
action="run"
name="cfpop-#qPostEmail.uid#"
uid="#qPostEmail.uid#">
<!---
Here's were it gets a bit tricky. Since our
CFThreads are running asyncronously, it is possible
that we will try to make several parallel CFPop
requests. Unfortunately, on my POP account, that is
not possible. So, as much as we want to run this
aysync to our primary thread, the individual threads
DO need to run in a serialized fashion. Apply the
same NAMED lock we used above. Since threads don't
timeout, don't worry about giving this lock a large
timeout value.
--->
<cflock
name="cfpop-bennadel"
type="exclusive"
timeout="300">
<!---
Get the FULL email information for this email
based on its UID. This will contain the file
attachment paths as well as the body. We have to
supply the attachmentPath attribute in order for
the downloads to take place.
--->
<!---
Get the full email body. Use the uid value that
was passed into the thread.
--->
<cfpop
action="getall"
name="qMail"
uid="#ATTRIBUTES.uid#"
attachmentpath="#ExpandPath( './files/' )#"
generateuniquefilenames="true"
attributecollection="#CFPopAttributes#"
/>
<!---
If the any files were downloaded, they will be
in a tab-delimited list of expanded paths in the
AttachmentFiles column.
--->
<!--- Loop over the tab-delimmitted file list. --->
<cfloop
index="strFilePath"
list="#qMail.attachmentfiles#"
delimiters="#Chr( 9 )#">
<!---
Now that you have the full file path, we can
process it in anyway that we want.
--->
<!---
For our demo, we will log the time of the
file downloads to a local text file.
--->
<cffile
action="append"
file="#ExpandPath( './download_log.txt' )#"
output="#TimeFormat( Now(), 'hh:mm:ss TT' )# : #GetFileFromPath( strFilePath )#"
addnewline="true"
/>
</cfloop>
<!---
Now that we have downloaded this email, let's
delete it from the server so we don't process
it again the next time.
--->
<cfpop
action="delete"
uid="#ATTRIBUTES.uid#"
attributecollection="#CFPopAttributes#"
/>
</cflock>
</cfthread>
</cfloop>
This code is really leveraging the advantages that ColdFusion 8 has given us. For starers, I am defining the majority of my CFPop server setting attributes in a structure that is then getting passed into the subsequent CFPop tags using the AttributeCollection attribute. Notice that my CFPop tags are using a combination of the AttributeCollection as well as inline tag attributes; this is something that I think has only become available in the 8.0.1 updater (and I think you can see how powerful it is). Then of course, we are using CFThread, which is new to ColdFusion 8.
Each of the email attachment downloads gets logged to a local text file and then the email in question is deleted from the POP server. After running some test on the code, my log file looks something like this:
08:21:30 AM : 533826955_f50654b0b0_b3.jpg
08:21:33 AM : 159218030_33eb715106.jpg
08:21:38 AM : 421163915_deaaa5a735_o.jpg
08:21:38 AM : 18882044_67fdc50ec0_o.jpg
08:23:18 AM : 31957216_8a9d0782e1.jpg
08:23:18 AM : 533826955_f50654b0b0_b.jpg
08:23:22 AM : 159218030_33eb7151061.jpg
08:23:27 AM : 421163915_deaaa5a735_o1.jpg
08:23:27 AM : 18882044_67fdc50ec0_o1.jpg
08:25:46 AM : 31957216_8a9d0782e11.jpg
08:25:46 AM : 533826955_f50654b0b0_b1.jpg
08:25:51 AM : 421163915_deaaa5a735_o2.jpg
08:25:51 AM : 18882044_67fdc50ec0_o2.jpg
08:25:56 AM : 159218030_33eb7151062.jpg
08:26:03 AM : 500010452_6c937f7a2c_o.jpg
08:26:03 AM : 221285527_ffd86b7f5b.jpg
Notice that the files that are grouped in the same minutes (ex. 8:25) are several seconds apart. This just goes to demonstrate that the CFPop process is relatively slow; we are transferring files across the internet and that takes time. This is why we are going through the trouble of using things like CFThread and CFLock.
There are several important caveats to keep in mind, but all in all, downloading email attachments with ColdFusion and CFPop is pretty straight forward. I am hoping to do some really fun stuff with this in the near future.
Want to use code from this post? Check out the license.
Reader Comments
Hi Ben,
Hope you are well, love the site revamp - I know its been like this for a while ;)
Thanks for this article the cfthread and cflocking really helped solve an issue I was having downloading messages into a db and a heap of image resizing I was doing from the attachments. I was getting blank messages in my db and then duplicate images boohoo. This was due to the slowness of the cfpop tag and the sizes of the images.
Cfthread has removed all of these issues, it is still slow but that is a small price to pay for stability. :)
Thanks Ben.
Jose
Ben - great article & great example - I've never had to process attachments from cfpop before, need to now, did one google search and found this and it totally nails the how-to & the performance issues in one shot! Thanks Ben, I always know I can count on you! And no inappropriate filenames or sample code! Totally awesome :-)
@Jose, @Jon,
Glad you're liking it. I haven't had to do too much work with POP lately. With CF9's new CFIMAP, however, I'll try to come up with some fun email type ideas.
Ben,
As always, great article!
Every time I go to the collective (Google) to see if someone has already done the grunt work for me for a generic problem, I always seem to find one of your blog posts that lays out the groundwork to point me in the right direction!
That being said, I ran into an issue with cfpop and email attachments and was wondering if you or anyone else ran into this.
If the attachment contains a colon ":" the attachment is not downloaded. In some cases the filename is shown in the Attachments column of the query result, but the AttachmentFiles column does not contain a path the this file.
This is on a windows server, where colon ":" is not an allowed character in a file name...
Anyone know of a way around this, or do I have ot resort to a third party POP solution?
I should also note that I am stuck on cf8 for the time being.
Thanks (and sorry for the comment thread hijack)!
--Ken
@Ken,
Ha ha, thanks :D As far as your problem, I have not run into that before, but as far as legal file names, I guess that makes sense. So no errors are being thrown? It simply doesn't download anything? That's really frustrating.
Ben,
Yea, the cfpop command works without throwing an error, I just get a different number of items in the attachments and attachmentfiles columns.
I am working around this for now, but I think in the future I will try a third party pop solution...
Thanks!
@Ken,
What is the work-around, if I may ask? Or are you just ignoring files with invalid Windows characters?
Unfortunately, I don't have a fancy technology based workaround here...
I am importing all emails that have attachments where the number of attachments match up, and then deleting that email from this accounts inbox.
So any remaining messages in the inbox are ones that need to be processed manually.
So the client has a process setup to manually check this inbox a few times per day.
They are happy as the automated system deals with over 90% of the emails, and they are working on training the email senders to get rid of the rest of the manual process by naming files correctly.
@Ken,
Well, hopefully after the training, the problem should disappear (fingers crossed).
Hi Ben,
Not sure if you pick these threads up still but
I have written a little script for a site im working on where by you can email a picture to a designated email and using a scheduled task the server checks the inbox every 5 minutes to see if there is an email. If there is it downloads the attachment and then using some basic cfimage stuff it takes the image, thumbnails it and stores it and some info in the db. this all works perfectly at the moment using cfpop stuff that you have amazingly shared above
All good. and for 90% of emails it works fine. However if i use my iphone 4 and send a picture in portrait mode the image always seems to come through rotated 90 degrees to the right. landscape ones are fine. its really irritating as i have no idea why it is happening. checking the attachment that is downloaded before i do any image manipulation shows that the image is wrong (if it was originally portrait).
Any ideas at all? or ideas how to start to figure this one out?
Thank you so much for the blog its awesome!
Mike,
I don't have an iPhone to check on, but are you sure that the iPhone is saving the picture in the orientation that you think it is?
I have seen pictures taken with some other smart phones that when the picture is viewed in the phone it is displayed "correctly" and even when sent to a web email service like yahoo or gmail, the picture is rotated correctly, but if you save the image to your computer and open it in a photo editing program, it was actually rotated 90 degrees from what you would expect.
When I ran into this in the past, the best guess I had was that the web email programs were looking at the images dimensions and guessing that the photo should be turned to view it correctly, making it seem like it was correct even though it was not...
HTH!
I have come across an odd problem using CFPOP with an Exchange Server.
The client sends us an email containing several files that we need to process (decryption).
The name of the file needs to be preserved.
So I found that once we entered production mode with the application, the files were all coming in with file names like ATT1380172800453.att instead of the expected 130923F1.zip.pgp
At this stage I do not know what email client they are using; but it is probably outlook.
To get the system up and running for testing purpose we re-queue the message on the exchange server. When we do that, the attachments come out correctly named.
Clearly there is some issue interpreting the attached file names.
It may be pertinent to add that we are using ColdFusion 9 on Windows Server 2008 R2 x64).
Is this a bug ? How can I make certain to get the correct file names when the file is received the first time. ?
Cheers,
Bryn Parrott
I am able to download .msg files on local hard disk i need to download the attachments from this .msg, how can i do so ?
please help