Moving 5-Years Of Files Into Year-Based Folders In Adobe ColdFusion 2021
As I mentioned before, I've been putting time into modernizing my ColdFusion blogging infrastructure. And, as part of that gesture, I wanted to clean-up some of my file-uploads. Today, when I upload a file for use in a blog post, I upload it into a year-based folder. However, I only started doing that in 2012. Which means, for the first 5 years, I was uploading all files into one root folder. Junky monkey! This morning, I finally got around to moving those early files into yearly buckets; and, more importantly, updating all the blog content in my database to reflect the new paths. Since writing ColdFusion is hella fun, I thought I would share the approach.
At first, when I started to write the migration code, I was going to try and use MySQL's RLIKE
operator for Regular Expression pattern matching. This way, I could try and find paths that excluded certain sub-folders. But, the code started to become too complicated. And, I decided to end up brute-forcing it.
Instead of trying to optimize the amount of data I pulled back from the database, I just pulled back any record that had an upload path; and then performed some additional checks / validation in the ColdFusion code.
The plan was relatively simple:
- Back-up database table before touching it!!!!
- Pull back any post with
/resources/upload/
in the content. - For each post:
- Extract all the
src
andhref
attributes that point to an upload. - Skip over any path that is already in a year-based folder.
- Copy the file to a year-based folder.
- Update the entry record for the new path.
- Move the original file to a back-up location for manual review (and eventual deletion).
- Extract all the
Note that I am not strictly deleting anything; instead, I am moving the original file to a backup location just in case. Deleting stuff is scary!!
In the end, here's the code I came up with:
<cfscript>
cfsetting( requestTimeout = ( 60 * 10 ) );
nextID = 0;
// We're going to iterate over the blog entries, one by one, looking for content that
// contains embedded upload paths. Then, we're going to move those files into year-
// based buckets and update the entry content with the new path.
do {
entry = getNextEntry( nextID );
// If there are no more entries, we're done - short-circuit the loop iteration,
// which will then exit on account of the while-condition.
if ( ! entry.recordCount ) {
continue;
}
// It's possible that the upload path is part of either an IMG SRC attribute or
// an ANCHOR HREF attribute. Extract the possible matches. I'm not sure if early
// paths were fully-qualified, so we're making the domain optional.
paths = entry.content_markdown
.reMatch( "(src|href)=""(https?://www.bennadel.com)?/resources/uploads/[^""]+""" )
;
for ( path in paths ) {
// Skip uploads that are already in a year-based folder.
if ( path.reFind( "/resources/uploads/(20\d\d)/" ) ) {
continue;
}
// Extract the path from the HTML attribute. Since our value is a quoted
// attribute value, we can treat the pattern match like a quote-delimited
// list.
path = path.listGetAt( 2, """" );
// Strip off any HTTP and domain prefix.
localPath = path.reReplace( "^https?://[^/]+", "" );
// Calculate the path that I would have used for the year-based bucketing of
// the given file path.
targetYear = entry.date_created.dateFormat( "yyyy" );
targetPath = localPath.replace( "/uploads/", "/uploads/#targetYear#/" );
// Get the fully-qualified server-paths for our files and directories.
// --
// NOTE: While Lucee CFML can use file-relative paths, Adobe ColdFusion has
// to operate on fully-qualified paths (at least the last time I tested it).
fullLocalPath = expandPath( "../.." & localPath );
fullTargetPath = expandPath( "../.." & targetPath );
fullTargetDirectory = getDirectoryFromPath( fullTargetPath );
// After we copy the files to the new, bucketed location, we're going to MOVE
// the original files to a temporary location for review (then delete them
// manually because deleting is HELLA SCARY!!!).
movedPath = localPath.replace( "/uploads/", "/uploads/__moved__/" );
fullMovedPath = expandPath( "../.." & movedPath );
fullMovedDirectory = getDirectoryFromPath( fullMovedPath );
// Log the path on which we are operating.
writeOutput(
entry.id &
" → " &
targetYear &
" → " &
encodeForHtml( localPath ) &
" → " &
encodeForHtml( targetPath ) &
"<br />"
);
// Since the blog entry content may be inaccurate, double-check that the file
// actually exists. If not, warn and skip.
if ( ! fileExists( fullLocalPath ) ) {
writeOutput( "<strong>CAUTION:</strong> Doesn't exist <br />" );
continue;
}
// Ensure the new target directory exists.
// --
// NOTE: ColdFusion 2021 will gracefully handle nested directories that don't
// exist. As such, we don't have to worry about deep paths here.
if ( ! directoryExists( fullTargetDirectory ) ) {
directoryCreate( fullTargetDirectory );
}
// In case we run this script multiple times (or something may have been
// buggy), only move the file if it doesn't exist in the target location.
if ( ! fileExists( fullTargetPath ) ) {
fileCopy( fullLocalPath, fullTargetPath );
}
// YOLO!! Commit the path changes to the database!
updateEntry( entry.id, localPath, targetPath );
// Now that we've updated our content, we can safely move the old files into
// the temporary "__moved__" directory where I can review them before one
// final delete!
if ( ! directoryExists( fullMovedDirectory ) ) {
directoryCreate( fullMovedDirectory );
}
if ( ! fileExists( fullMovedPath ) ) {
fileMove( fullLocalPath, fullMovedPath );
}
}
nextID = ( entry.id + 1 );
} while ( entry.recordCount );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I look for the next blog entry that has "resource/uploads" path.
*/
public query function getNextEntry( required numeric minID ) {
// NOTE: I'm using date-range because I know I started using folder-based
// bucketing in 2011; so, any entry posted in 2012 or later should be OK.
var results = queryExecute(
"
SELECT
e.id,
e.name,
e.content_markdown,
e.date_created
FROM
blog_entry e
WHERE
e.id >= :id
AND
e.date_created < '2012-01-01'
AND
e.content_markdown LIKE '%/resources/uploads/%'
LIMIT
1
",
{
id: {
value: minID,
cfsqltype: "integer"
}
}
);
return( results );
}
/**
* I update the content for the given entry, replacing root upload paths with the
* given bucket-upload path.
*/
public void function updateEntry(
required numeric id,
required string localPath,
required string targetPath
) {
var results = queryExecute(
"
UPDATE
blog_entry
SET
content = REPLACE( content, :fromPath, :toPath ),
content_markdown = REPLACE( content_markdown, :fromPath, :toPath ),
formatted_content = REPLACE( formatted_content, :fromPath, :toPath )
WHERE
id = :id
LIMIT
1
",
{
id: {
value: id,
cfsqltype: "cf_sql_integer"
},
fromPath: {
value: localPath,
cfsqltype: "cf_sql_varchar"
},
toPath: {
value: targetPath,
cfsqltype: "cf_sql_varchar"
}
}
);
}
</cfscript>
As you can see, it's just a brute-force nested list. Sometimes (often times) brute force is just the best option. You can worry about making it elegant; or, you can worry about getting it done!
And while I didn't end up using RLIKE
in MySQL, no solution would truly be complete without a copious amount of Regular Expressions. You'll note that I made good use of pattern-matching when extracting and transforming the paths. As I've said a million times before, learn you some Regular Expressions for great good! They will change your life!
Anyway, just some celebration of how good life can be when you have ColdFusion in it! And finally, I can breathe a sigh of relief when I look at the nicely striped file-system.
Want to use code from this post? Check out the license.
Reader Comments