Skip to main content
Ben Nadel at the Angular NYC Meetup (Dec. 2018) with: Ben Lesh
Ben Nadel at the Angular NYC Meetup (Dec. 2018) with: Ben Lesh

Using fileGetMimeType() To Determine File Type In ColdFusion

By
Published in Comments (4)

This morning, in a discussion about inspecting file upload contents within the temp directory, Brian Reilly taught me that there is a native ColdFusion function for determining a given file's mime-type: fileGetMimeType(). This function—when operating in the default "strict mode"—will inspect the contents of a given file and return the true mime-type, regardless of which file extension is being used. I can't believe this has existed since ColdFusion 10 and I didn't know about it!

To see fileGetMimeType() in action, I'm going to create two files: one is a native .txt file and one is a native .pdf file. Then, I'm going to copy each of these files into another file with an incorrect file extension and see which mime-type is returned:

<cfscript>

	txtFile = expandPath( "./files/text.txt" );
	fakeTxtFile = expandPath( "./files/fake-text.txt" );

	pdfFile = expandPath( "./files/doc.pdf" );
	fakePdfFile = expandPath( "./files/fake-doc.pdf" );

	// Copy REAL text file into FAKE pdf.
	fileCopy( txtFile, fakePdfFile );
	// Copy REAL pdf file into FAKE txt.
	fileCopy( pdfFile, fakeTxtFile );

</cfscript>
<cfoutput>

	<strong>Text Files (.txt)</strong>

	<dl>
		<dt>
			Real &rarr; #getFileFromPath( txtFile )#:
		</dt>
		<dd>
			#fileGetMimeType( txtFile )#
		</dd>
		<!--- Really a PDF disguised as a TXT file. --->
		<dt>
			Fake &rarr; #getFileFromPath( fakeTxtFile )#:
		</dt>
		<dd>
			#fileGetMimeType( fakeTxtFile )#
		</dd>
	</dl>

	<strong>PDF Files (.pdf)</strong>

	<dl>
		<dt>
			Real &rarr; #getFileFromPath( pdfFile )#:
		</dt>
		<dd>
			#fileGetMimeType( pdfFile )#
		</dd>
		<!--- Really a TXT disguised as a PDF file. --->
		<dt>
			Fake &rarr; #getFileFromPath( fakePdfFile )#:
		</dt>
		<dd>
			#fileGetMimeType( fakePdfFile )#
		</dd>
	</dl>

</cfoutput>

I now have a PDF with a .txt file extension and a text file with a .pdf file extension. And, when we run this ColdFusion code, we get the following output:

fileGetMimeType() returns text/plain for a real text file and application/pdf for a PDF file disguised as a text file. It also returns application/pdf for a real PDF file and text/plain for a text file disguised as a PDF file.

As you can see, the correct mime-type is being returned regardless of which file extension is currently in use. For this, ColdFusion must inspect the contents of the file. Since Adobe ColdFusion is closed-source, I have no idea how it's doing this; but, since Lucee CFML is open-source, we can see on GitHub that they are using the Apache Tika project (for their CFML-specific implementation).

On the Tika project page, they say that sometimes they can look for "magic bytes" in the file binary; and, other times, they have to do some more fuzzy matching. We can see this fuzzy matching in action by generating files with random bytes (ie, that have no magic bytes):

<cfscript>

	randomFile = expandPath( "./files/random.bytes" );

	// Write nothing but control characters.
	fileWrite( randomFile, javaCast( "byte[]", [ 1, 2, 3, 4, 5, 6, 7 ] ) );
	writeDump( fileGetMimeType( randomFile ) );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	// Write nothing but ASCII characters
	fileWrite( randomFile, javaCast( "byte[]", [ 32, 42, 52, 62 ] ) );
	writeDump( fileGetMimeType( randomFile ) );

</cfscript>

When we run this ColdFusion code, we get two different mime-types reported:

  • application/octet-stream - for the file that contained nothing but control characters.

  • text/plain - for the file that contained nothing but ASCII characters.

So, even though we don't know exactly what ColdFusion is doing under the hood, we can see that they are using the actual byte content of the file to determine the mime type. This is awesome!

By default, the fileGetMimeType() function runs in strict mode, which means that it inspects the file content. If you override this behavior, and run it in non-strict mode (passing false in as the second argument), it will only look at the file name. And, if the file name contains an unrecognized extension, it will just return application/octet-stream.

Want to use code from this post? Check out the license.

Reader Comments

89 Comments

"Trust, but verify." -Russian proverb

My mime type unit test contains 409 different file extensions, but I don't use any physical test files. The fileGetMimeType function requires a physical file even when the 2nd parameter is false. (If false & detection is based solely on the file name, why does CF require a physical file to be present?) I often use ColdFusion to generate a file in-memory, serve it using CFContent and have to also include a mime type. I don't want to be required to first save it to needlessly to the file system in order to identify the mime type.

We've expanded the UDF from CFLib called getMimeType and use it for all internally-hosted files that we serve via CFContent.

When comparing file extension mime results using ColdFusion 2016's fileGetMimeType and our expanded getMimeType UDF:

  • 82-115 extensions are missing in fileGetMimeType (and return the default "octet-stream")
  • 101 are different from the our configured mime types
  • 11 MS-related files had different capitalization (not sure if it's important)

When it comes to ACF's file detection, YMMV. I've encountered images that will return true when using isImageFile, but then throw an error when attempting to be modified by CFImage. I've also encountered instances where isPDF will return true and then throw an error when the same file is used with PDF-related functions. As a result, we've had to leverage alternative command-line libraries to fill in the gaps to finish the job.

15,902 Comments

@James,

I suppose this stuff isn't an exact science. But, I agree that it seems strange that you should need a physical file to test this stuff, especially when you the loose/non-strict mode is just looking at the file extension. That said, I mostly deal with image files, and it seems to be decent at that. Though, to be fair, I haven't tested it with many different types (such as webp or tiff or any of the other esoteric types).

89 Comments

After your article, I've started comparing my UDF's static configuration of extensions & MIME types. One the first mismatches was the .ai extension for Adobe Illustrator.

My UDF returns application/postscript and Adobe's BIF returns application/illustrator.

This article from 2022 states that the correct MIME type is application/pdf.
https://radu.link/mime-type-adobe-illustrator-ai-files/

Which is actually correct? Do other platforms & browsers treat this file with a consistent MIME type? Probably not. :(

I prefer to use the UDF approach when determining the MIME type to deliver files via CFContent. Regarding uploaded files, I've had to leverage command line tools like Exiftool, Exiv2 and MediaInfo as the Java libraries used with ColdFusion have historically returned an unreasonable amount of false positives for me. #YMMV

15,902 Comments

@James,

When it comes to mime-types validation, I think there are two camps of people:

  1. Good actors who just happen to upload the wrong file (ex, they have a PDF that is actually a .pdf.zip or something they just don't realize what they're doing).

  2. Bad actors who are trying to do something malicious (ex, upload a .exe file using a .jpg extension).

I call this out to try and work backwards from what the validation is attempting to do. In the case of the bad actor, we want to validate because we want to limit the possible vulnerable landscape. So, I think the more that a malicious actor can cross-contaminate with other users, the stricter we have to be in our validation. But, for cases where a user can only harm themselves, we can probably be less in-depth in our mitigation techniques.

This is more me just talking to myself 🤣 justifying the degree to which I want to worry about this sort of thing.

That said, it seems so odd that this isn't just a solved problem already. But, I guess a file is just a collection of bytes, it's a complicated problem to solve.

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel