Using fileGetMimeType() To Determine File Type In ColdFusion
This morning, in a discussion about inspecting file upload contents within the temp directory, Brian Reilly taught me that there is a native ColdFusion function for determining a given file's mime-type: fileGetMimeType()
. This function—when operating in the default "strict mode"—will inspect the contents of a given file and return the true mime-type, regardless of which file extension is being used. I can't believe this has existed since ColdFusion 10 and I didn't know about it!
To see fileGetMimeType()
in action, I'm going to create two files: one is a native .txt
file and one is a native .pdf
file. Then, I'm going to copy each of these files into another file with an incorrect file extension and see which mime-type is returned:
<cfscript>
txtFile = expandPath( "./files/text.txt" );
fakeTxtFile = expandPath( "./files/fake-text.txt" );
pdfFile = expandPath( "./files/doc.pdf" );
fakePdfFile = expandPath( "./files/fake-doc.pdf" );
// Copy REAL text file into FAKE pdf.
fileCopy( txtFile, fakePdfFile );
// Copy REAL pdf file into FAKE txt.
fileCopy( pdfFile, fakeTxtFile );
</cfscript>
<cfoutput>
<strong>Text Files (.txt)</strong>
<dl>
<dt>
Real → #getFileFromPath( txtFile )#:
</dt>
<dd>
#fileGetMimeType( txtFile )#
</dd>
<!--- Really a PDF disguised as a TXT file. --->
<dt>
Fake → #getFileFromPath( fakeTxtFile )#:
</dt>
<dd>
#fileGetMimeType( fakeTxtFile )#
</dd>
</dl>
<strong>PDF Files (.pdf)</strong>
<dl>
<dt>
Real → #getFileFromPath( pdfFile )#:
</dt>
<dd>
#fileGetMimeType( pdfFile )#
</dd>
<!--- Really a TXT disguised as a PDF file. --->
<dt>
Fake → #getFileFromPath( fakePdfFile )#:
</dt>
<dd>
#fileGetMimeType( fakePdfFile )#
</dd>
</dl>
</cfoutput>
I now have a PDF with a .txt
file extension and a text file with a .pdf
file extension. And, when we run this ColdFusion code, we get the following output:
As you can see, the correct mime-type is being returned regardless of which file extension is currently in use. For this, ColdFusion must inspect the contents of the file. Since Adobe ColdFusion is closed-source, I have no idea how it's doing this; but, since Lucee CFML is open-source, we can see on GitHub that they are using the Apache Tika project (for their CFML-specific implementation).
On the Tika project page, they say that sometimes they can look for "magic bytes" in the file binary; and, other times, they have to do some more fuzzy matching. We can see this fuzzy matching in action by generating files with random bytes (ie, that have no magic bytes):
<cfscript>
randomFile = expandPath( "./files/random.bytes" );
// Write nothing but control characters.
fileWrite( randomFile, javaCast( "byte[]", [ 1, 2, 3, 4, 5, 6, 7 ] ) );
writeDump( fileGetMimeType( randomFile ) );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
// Write nothing but ASCII characters
fileWrite( randomFile, javaCast( "byte[]", [ 32, 42, 52, 62 ] ) );
writeDump( fileGetMimeType( randomFile ) );
</cfscript>
When we run this ColdFusion code, we get two different mime-types reported:
application/octet-stream
- for the file that contained nothing but control characters.text/plain
- for the file that contained nothing but ASCII characters.
So, even though we don't know exactly what ColdFusion is doing under the hood, we can see that they are using the actual byte content of the file to determine the mime type. This is awesome!
By default, the fileGetMimeType()
function runs in strict mode, which means that it inspects the file content. If you override this behavior, and run it in non-strict mode (passing false
in as the second argument), it will only look at the file name. And, if the file name contains an unrecognized extension, it will just return application/octet-stream
.
Want to use code from this post? Check out the license.
Reader Comments
"Trust, but verify." -Russian proverb
My mime type unit test contains 409 different file extensions, but I don't use any physical test files. The
fileGetMimeType
function requires a physical file even when the 2nd parameter is false. (If false & detection is based solely on the file name, why does CF require a physical file to be present?) I often use ColdFusion to generate a file in-memory, serve it using CFContent and have to also include a mime type. I don't want to be required to first save it to needlessly to the file system in order to identify the mime type.We've expanded the UDF from CFLib called getMimeType and use it for all internally-hosted files that we serve via CFContent.
When comparing file extension mime results using ColdFusion 2016's fileGetMimeType and our expanded getMimeType UDF:
When it comes to ACF's file detection, YMMV. I've encountered images that will return
true
when using isImageFile, but then throw an error when attempting to be modified by CFImage. I've also encountered instances where isPDF will returntrue
and then throw an error when the same file is used with PDF-related functions. As a result, we've had to leverage alternative command-line libraries to fill in the gaps to finish the job.@James,
I suppose this stuff isn't an exact science. But, I agree that it seems strange that you should need a physical file to test this stuff, especially when you the loose/non-strict mode is just looking at the file extension. That said, I mostly deal with image files, and it seems to be decent at that. Though, to be fair, I haven't tested it with many different types (such as webp or tiff or any of the other esoteric types).
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →