Learning ColdFusion 8: CFZip Part IV - Extracting Zip File Archives
In the last part of this series, we looked at how to list and read the files of a zip archive using ColdFusion 8's new CFZip and CFZipParam tags. Those techniques alone could be used to fully unzip an archive, but certainly, it wouldn't be the best choice. Luckily, ColdFusion 8's CFZip tag provides us with the Unzip action which can unzip an entire archive, or subsections of an archive, with the greatest of ease.
As always, before we can start exploring the unzipping of an archive in ColdFusion 8, we have to create a zip archive to work with. We are going to zip the following directory structure:
./data/documents/manual.txt
./data/documents/readme.txt
./data/images/funny.jpg
./data/images/mud_monster.jpg
./data/images/red_face.jpg
./data/images/smile.jpg
To create the zip, we are going to use the simple ColdFusion 8 CFZip tag:
<!---
Create a zip archive of the data directory.
By default, ColdFusion 8 will recurse the
source directory, store the storage paths,
and do to our use of Overwrite attribute, we
will make sure we create a new zip archive.
--->
<cfzip
action="zip"
source="#ExpandPath( './data/' )#"
file="#ExpandPath( './data.zip' )#"
overwrite="true"
/>
This will put all contents of the data directory into the root of our zip archive (the documents and images directory). Now that have a zip, the easiest thing we can do is reverse the process by unzipping the entire archive using the unzip action:
<!---
Unzip the zip archive into the directory
named "unzipped". The unzipped directory must
exists before we perform this action.
--->
<cfzip
action="unzip"
file="#ExpandPath( './data.zip' )#"
destination="#ExpandPath( './unzipped/' )#"
/>
Here, we are using the two required attributes that go with the Unzip action. The File attribute is the absolute path to the zip archive we are going to unzip. The Destination attribute is the absolute path to the directory into which we are going to unzip the archive contents. This directory must exist before you try to reference it; if you try to unzip the archive into a non-existent directory, ColdFusion 8 will throw the following error:
The destination G:\....\cf8\zip\unzipped specified in the cfzip tag is invalid. The destination must be a directory and should be accessible by this tag.
By default, ColdFusion 8 will unzip the entire archive, keeping the archive directory structure (entry path structure) as is, and will not overwrite files that already exist in the destination directory. But, by using some of the optional attributes of the CFZip tag, we can change the way things happen.
The StorePath attribute, which defaults to True, is what determines whether or not we keep the entry path structure. If directory structure is not important to us and we want to unzip all the entries directly into the root of the destination folder, all we need to do is set StorePath to false. Running this code:
<!---
Unzip the zip archive into the directory
named "unzipped_flat". Instead of keeping the
archive entry path structure, we are going to
unzip all of the entries directly into the
root of our target directory.
--->
<cfzip
action="unzip"
file="#ExpandPath( './data.zip' )#"
destination="#ExpandPath( './unzipped_flat/' )#"
storepath="false"
/>
... will leave us with an unzipped_flat directory that looks like this:
./funny.jpg
./manual.txt
./mud_monster.jpg
./readme.txt
./red_face.jpg
./smile.jpg
When we flatten a zip archive, as we just did above, one of the things we have to be careful of is possible naming conflicts that might be caused by like-named files at different entry paths being merged into the destination root. For example, if we added this text file entry:
./images/readme.txt
... with the content:
This is the IMAGES readme file.
... to the data archive, when flattened, the readme.txt file in the documents folder would be conflict with the readme.txt we just added to the images folder. By default, ColdFusion 8 will not overwrite any existing files in the destination directory, and as such, the documents readme.txt file will be the only readme.txt file that gets unzipped. Since unzipping happens in a depth-first fashion, the images readme.txt file will be examined only after the documents readme.txt file, and since there is already a readme.txt in the root of the destination, it will not be unzipped. ColdFusion 8 will not throw an error over this, it will simply skip the current archive entry.
By setting the Overwrite attribute to True, we can get ColdFusion to overwrite any files that already exist. Therefore, running this code:
<!---
Unzip the zip archive into the directory
named "unzipped_flat". Instead of keeping the
archive entry path structure, we are going to
unzip all of the entries directly into the
root of our target directory. As we merge the
entries, we are going to overwrite them, thereby
keeping only the last version of all the
like-named files we come accross.
--->
<cfzip
action="unzip"
file="#ExpandPath( './data.zip' )#"
destination="#ExpandPath( './unzipped_flat/' )#"
storepath="false"
overwrite="true"
/>
... the images readme.txt will still be unzipped after the documents readme.txt, but this time, the images readme.txt file will overwrite the one from the documents entry path.
Using the Recurse attribute, which defaults to True, we can get CFZip to only extract a single directory. Running this code:
<!---
Unzip the zip archive into the directory
named "unzipped_root". Instead of recursing
through the entire archive, just unzip the
root directory.
--->
<cfzip
action="unzip"
file="#ExpandPath( './data.zip' )#"
destination="#ExpandPath( './unzipped_root/' )#"
recurse="false"
/>
... will unzip the only the root of the zip archive into the directory, unzipped_root. However, since there are no files in the root of our archive (only our two directories - documents and images), our unzipped_root directory remains empty.
If we don't want to recurse, but we also don't want to unzip the root of the archive, we can use the optional EntryPath attribute to get at a subdirectory of the archive. If we wanted to unzip just the images folder, we run this code:
<!---
Unzip the archived images folder into the
directory named "unzipped_images". By not storing
the path of the entry, we will ensure that an
"images" folder does not get created.
--->
<cfzip
action="unzip"
file="#ExpandPath( './data.zip' )#"
destination="#ExpandPath( './unzipped_images/' )#"
entrypath="images"
recurse="false"
storepath="false"
/>
This would leave us with an unzipped_images directory that looks like this (we are no longer dealing with the readme.txt in the images directory):
./funny.jpg
./mud_monster.jpg
./red_face.jpg
./smile.jpg
Now, there's actually a bunch of things happening synergistically in the code that we just ran. We turned off directory recursion so that if images had a subdirectory, it would be ignored. We then told CFZip not to store the entry paths. This is make sure we don't end up with an "images" directory inside of our unzipped_images destination directory. By default, since the images are inside of an images archive folder, CFZip wants to create that images folder into which it will unzip the image entries. Then, finally, to make sure we are just unzipping the images archive folder, we use the optional EntryPath attribute to point the action at the images folder.
The EntryPath attribute acts a bit different than it did when we examined it in the context of Reading archive entries. When reading an archive entry , you cannot use the "./" and "/" leading path constructs or ColdFusion 8 will throw an error. When it comes to unzipping an archive, you still cannot use the "./" or "/" leading path constructs. Additionally, you cannot even use the trailing "/" characters. The following EntryPath values are all invalid:
/images/
./images/
images/
The difference, when unzipping an archive, is that ColdFusion 8 will not throw any errors. The above paths will simply not work. In order to properly define a target directory, you must exclude both leading and trailing path slash constructs.
Now, we could have accomplished the same thing by using the optional Filter attribute. As we have covered in almost every other part of this series, the filter attribute uses file masks to limit the files that are included in the CFZip action. To reach the same outcome as above, we could have unzipped files of type JPG into the root of our destination folder:
<!---
Unzip the all archived images of type JPG into
the directory named "unzipped_images". By not
storing the path of the entry, we will be unzipping
all files into the root of the destination directory.
--->
<cfzip
action="unzip"
file="#ExpandPath( './data.zip' )#"
destination="#ExpandPath( './unzipped_images2/' )#"
filter="*.JPG"
storepath="false"
/>
Up till now, we have been unzipping directories of files, but the EntryPath attribute can point to a single file as well. In the following code, we are going to unzip just the mud_monster.jpg image into the destination directory:
<!---
Unzip the mud_monster.jpg image into the directory
named "unzipped_single". By not storing the path
of the entry, we will make sure not to create the
images subdirectory in our distination folder.
--->
<cfzip
action="unzip"
file="#ExpandPath( './data.zip' )#"
destination="#ExpandPath( './unzipped_single/' )#"
entrypath="images/mud_monster.jpg"
storepath="false"
/>
Since we don't care about the images folder itself, we just care about the mud_monster.jpg, we are not storing the entry path structure. This will ensure that mud_monster.jpg goes into the root of our destination directory and not into an images folder within the root.
As with all the CFZip actions, unzipping an archive can be done using the CFZip tag in conjunction with one or more nested CFZipParam tags. As a simple example, we can mimic the unzipped images directory by moving the EntryPath, Recurse, and Filter attributes from the CFZip tag down into a CFZipParam tag:
<!---
Unzip the all archived images of type JPG located
in the images folder into the directory named
"unzipped_images3". By not storing the path of the
entry, we will be unzipping all files into the root
of the destination directory.
--->
<cfzip
action="unzip"
file="#ExpandPath( './data.zip' )#"
destination="#ExpandPath( './unzipped_images3/' )#"
storepath="false">
<!--- Unzip the images folder. --->
<cfzipparam
entrypath="images"
recurse="false"
filter="*.JPG"
/>
</cfzip>
Now, when you move attributes down into the CFZipParam tag, it doesn't always have to be one or the other. While the EntryPath and Filter attributes cannot be defined in both the CFZip and CFZipParam tags, the recurse attribute can be defined in the CFZip tag and then overwritten in the CFZipParam tags.
Furthermore, we don't just have to have one CFZipParam tag. We can use multiple CFZipParam tags to define highly dynamic unzipping algorithms. While not complicated in scope, we could mimic our first example (of unzipping the entire zip archive) but using two CFZipParam tags - one for the documents directory and one for the images directory:
<!---
Unzip the the documents and images archive
folder into the directory named "unzipped3".
--->
<cfzip
action="unzip"
file="#ExpandPath( './data.zip' )#"
destination="#ExpandPath( './unzipped3/' )#"
overwrite="true">
<!--- Unzip the documents folder. --->
<cfzipparam
entrypath="documents"
/>
<!--- Unzip the images folder. --->
<cfzipparam
entrypath="images"
filter="*.JPG"
/>
</cfzip>
And, of course, as with the CFZip tag, the EntryPath does not need to point to a directory; it can point to either a directory of a specific file.
ColdFusion 8 is just making all this stuff too easy. I would sum up how cool CFZip / CFZipParam tags are, but come on, it's the end of Part IV - if you don't get it yet, no summary is gonna do anything :)
Want to use code from this post? Check out the license.
Reader Comments
Is there a way to unzip a .gz file with the cfzip tag? I've been looking around the web a bit and can't seem to find a solution for it.
@Brett,
I am not sure on that one.
Well. I couldn't fine the answer on that one either. But after some searching I did manage to find a component that I just tested and works great.
I've posted the link to the download and the documentation in the event somebody finds a use for it as I have.
Download: http://download.newsight.de/Zip.zip
Documentation: http://livedocs.newsight.de/Zip/
Will this component work on cf mx7?
Thanks
Hi Ben,
again - in depth investigation into a CF TAG. Almost better then any CF documentation issued by ADOBE ;-)
josef
Hi, I have been using cfzip to unzip images. Then I user cfzip to list the archive so I can resize the images with cfimage.
My problem is when the images have spaces in the file name I get an error, cannot find file.
I understand how to rename a file upon upload to avoid this but in this case, the zip file name(file being uploaded) doesn't matter, it is the files within the zip file I want to rename, is it possible to rename these files as they are unzipped?
@Bill,
Hmm, I will have to take a look into this. I have not seen this before; but, it is very possible that I have never done any testing with spaces.
You might have to quote the path, but I can't imagine that they would have made that a requirement.
Ben,
I have two sites that I am trying to sync up the code base using a utility that I have written. The problem is that I have to use ftp to move the files. When I use ftp the name is modified on the target site. SO! I zipped the file first and ftp it to a holding directory where I then unzip it to the correct directory. In the zipped file the modified date is correct and in the zipped file at the other end. I can open the file and upzip it with Windows and it retains its modified date.
The Problem is that <cfzip action=unzip changes the modified date to the current date.
Is this the normal action of an upzip? Is there an option to override this?
Thanks!
David
Hello,
Is there a reason that you wrote a bunch of code to do the sync instead of using an application that is designed to do exactly what you are looking for?
I think that there is a 'freeware' version of SynBack for instance that gets pretty good reviews. The interface is a bit to get used to, but it does the job.
There are a number of such applications.
I tried/tested a lot of them and found that some were able to move and sync files considerably faster than others.
Unfortunately, I don't have the list to share at this time.
However, as much as I love ColdFusion, why reinvent the wheel?
Best regards,
Kevin Randolph
Kevin,
Thanks for the suggestion. I just resolved the issue. I built a custom one as we have 2 development environments, plus an Integration and Pre-Production environment on one server and Production on another server. We wanted a view of all of the files on any of the envornments and the ablility to move the file with a click.
I will check out the SynBack application.
I appreciate your input!
Regards,
David
@David, @Kevin,
To be honest, I haven't used CFZip in a while and I'm not sure what the normal behavior of the date would be.
That said, and to @Kevin's point, one of my favorite applications of ALL TIME is "Beyond Compare". Unfortunately, they don't have it for Mac; but, on Windows, it's the cat's pajamas. And, I think it supports FTP syncing, which is cool if you like to manually curate the file sync.
Sorry my answer doesn't speak to your problem more directly.