Using fileGetMimeType() To Determine File Type In ColdFusion

By Ben Nadel

Published 2024-11-16 in ColdFusion — Comments (2)

This morning, in a discussion about inspecting file upload contents within the temp directory, Brian Reilly taught me that there is a native ColdFusion function for determining a given file's mime-type: fileGetMimeType(). This function—when operating in the default "strict mode"—will inspect the contents of a given file and return the true mime-type, regardless of which file extension is being used. I can't believe this has existed since ColdFusion 10 and I didn't know about it!

To see fileGetMimeType() in action, I'm going to create two files: one is a native .txt file and one is a native .pdf file. Then, I'm going to copy each of these files into another file with an incorrect file extension and see which mime-type is returned:

<cfscript>

	txtFile = expandPath( "./files/text.txt" );
	fakeTxtFile = expandPath( "./files/fake-text.txt" );

	pdfFile = expandPath( "./files/doc.pdf" );
	fakePdfFile = expandPath( "./files/fake-doc.pdf" );

	// Copy REAL text file into FAKE pdf.
	fileCopy( txtFile, fakePdfFile );
	// Copy REAL pdf file into FAKE txt.
	fileCopy( pdfFile, fakeTxtFile );

</cfscript>
<cfoutput>

	<strong>Text Files (.txt)</strong>

	<dl>
		<dt>
			Real &rarr; #getFileFromPath( txtFile )#:
		</dt>
		<dd>
			#fileGetMimeType( txtFile )#
		</dd>
		<!--- Really a PDF disguised as a TXT file. --->
		<dt>
			Fake &rarr; #getFileFromPath( fakeTxtFile )#:
		</dt>
		<dd>
			#fileGetMimeType( fakeTxtFile )#
		</dd>
	</dl>

	<strong>PDF Files (.pdf)</strong>

	<dl>
		<dt>
			Real &rarr; #getFileFromPath( pdfFile )#:
		</dt>
		<dd>
			#fileGetMimeType( pdfFile )#
		</dd>
		<!--- Really a TXT disguised as a PDF file. --->
		<dt>
			Fake &rarr; #getFileFromPath( fakePdfFile )#:
		</dt>
		<dd>
			#fileGetMimeType( fakePdfFile )#
		</dd>
	</dl>

</cfoutput>

I now have a PDF with a .txt file extension and a text file with a .pdf file extension. And, when we run this ColdFusion code, we get the following output:

fileGetMimeType() returns text/plain for a real text file and application/pdf for a PDF file disguised as a text file. It also returns application/pdf for a real PDF file and text/plain for a text file disguised as a PDF file.

As you can see, the correct mime-type is being returned regardless of which file extension is currently in use. For this, ColdFusion must inspect the contents of the file. Since Adobe ColdFusion is closed-source, I have no idea how it's doing this; but, since Lucee CFML is open-source, we can see on GitHub that they are using the Apache Tika project (for their CFML-specific implementation).

On the Tika project page, they say that sometimes they can look for "magic bytes" in the file binary; and, other times, they have to do some more fuzzy matching. We can see this fuzzy matching in action by generating files with random bytes (ie, that have no magic bytes):

<cfscript>

	randomFile = expandPath( "./files/random.bytes" );

	// Write nothing but control characters.
	fileWrite( randomFile, javaCast( "byte[]", [ 1, 2, 3, 4, 5, 6, 7 ] ) );
	writeDump( fileGetMimeType( randomFile ) );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	// Write nothing but ASCII characters
	fileWrite( randomFile, javaCast( "byte[]", [ 32, 42, 52, 62 ] ) );
	writeDump( fileGetMimeType( randomFile ) );

</cfscript>

When we run this ColdFusion code, we get two different mime-types reported:

application/octet-stream - for the file that contained nothing but control characters.
text/plain - for the file that contained nothing but ASCII characters.

So, even though we don't know exactly what ColdFusion is doing under the hood, we can see that they are using the actual byte content of the file to determine the mime type. This is awesome!

By default, the fileGetMimeType() function runs in strict mode, which means that it inspects the file content. If you override this behavior, and run it in non-strict mode (passing false in as the second argument), it will only look at the file name. And, if the file name contains an unrecognized extension, it will just return application/octet-stream.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/4727

Reader Comments

James Moberg Nov 16, 2024 at 6:53 PM

86 Comments

"Trust, but verify." -Russian proverb

My mime type unit test contains 409 different file extensions, but I don't use any physical test files. The fileGetMimeType function requires a physical file even when the 2nd parameter is false. (If false & detection is based solely on the file name, why does CF require a physical file to be present?) I often use ColdFusion to generate a file in-memory, serve it using CFContent and have to also include a mime type. I don't want to be required to first save it to needlessly to the file system in order to identify the mime type.

We've expanded the UDF from CFLib called getMimeType and use it for all internally-hosted files that we serve via CFContent.

When comparing file extension mime results using ColdFusion 2016's fileGetMimeType and our expanded getMimeType UDF:

82-115 extensions are missing in fileGetMimeType (and return the default "octet-stream")
101 are different from the our configured mime types
11 MS-related files had different capitalization (not sure if it's important)

When it comes to ACF's file detection, YMMV. I've encountered images that will return true when using isImageFile, but then throw an error when attempting to be modified by CFImage. I've also encountered instances where isPDF will return true and then throw an error when the same file is used with PDF-related functions. As a result, we've had to leverage alternative command-line libraries to fill in the gaps to finish the job.

Ben Nadel Nov 17, 2024 at 11:20 AM

15,861 Comments

@James,

I suppose this stuff isn't an exact science. But, I agree that it seems strange that you should need a physical file to test this stuff, especially when you the loose/non-strict mode is just looking at the file extension. That said, I mostly deal with image files, and it seems to be decent at that. Though, to be fair, I haven't tested it with many different types (such as webp or tiff or any of the other esoteric types).

Reader Comments

Post A Comment — ❤️ I'd Love To Hear From You! ❤️

Post A Comment — I'd Love To Hear From You!