Passing isArray() Decision Function Does Not Ensure Member Methods In Lucee CFML 5.3.3.62
This post is primarily a note-to-self so I don't make this mistake again. But, the other day, when I was working on the memory-leak detector code for Lucee CFML, I ran into a fun edge-case having to do with Reflection-style programming. In that post, I used Lucee's Decision functions (ex, isArray()
, isStruct()
, isBinary()
) in order to figure out how to generate a string-based representation of a complex value. What I discovered, once my code hit production, is that passing the isArray()
decision function does not ensure that the given value has array member methods in Lucee 5.3.3.62.
To demonstrate this, I can recreate the case that bit me: binary values. A binary value is an array of Bytes. But, it's a native Java Array - not a "ColdFusion Array". As such, it passes the isArray()
call, but doesn't expose methods like .len()
:
<cfscript>
value = charsetDecode( "hello world", "utf-8" );
if ( isArray( value ) ) {
echo( "Length: #value.len()#" );
}
</cfscript>
The charsetDecode()
function converts the given String to its Binary representation (a byte array). And, when we run the above code, we get the following ColdFusion error:
As you can see, the binary value passed the isArray()
decision function; but, didn't provide the Array member-method, .len()
.
On one hand, this is a surprising behavior since you might expect the decision functions to allow you to make safe decisions about data handling. However, on the other hand, this is not that surprising since a ColdFusion Array is just a type of Array, not a base definition for all Arrays.
Consider the concept of Promises. In JavaScript, we often have to normalize Promises in order to guarantee a set of methods. For example, if we have a Promise that may or may not have been generated by Bluebird, we would have to normalize the value as a Bluebird Promise:
var trustedPromise = Bluebird.resolve( untrustedPromise )
In this case, both untrustedPromise
and trustedPromise
are Promises. However, they potentially have a different set of public methods. This is because, a Bluebird Promise is a type of Promise - not the base class for all Promises.
Similarly, if we have an untrusted array in Lucee CFML, we could always normalize the value using arraySlice()
:
var trustedArray = ArraySlice( untrustedArray, 1 )
Assuming the untrusted array has a non-zero length, this will result in a trustedArray
value that guarantees Lucee CFML Array member methods.
Of course, when we can't trust the Array, we could always just fallback to using the more traditional global Array function, arrayLen()
. The arrayLen()
function is more flexible in the type of values that it can process; and, will happily report the length of a Binary value (byte array).
Ultimately, it comes down to trust: how much do you trust the values you are working with? If you wrote the code that generated the values, then the trust is complete and implicit. However, if you're writing some sort of reflection-style code, as I was, then you're consuming values that you didn't create. As such, you can't trust them. And, if you can't trust them, you either have to cast them to values you can trust; or, you have to fallback to using functions that are more flexible.
In short, this was not a bug in Lucee CFML - this was a bug in the way I was thinking about the data, its source, and its consumption context. And, hopefully by writing this down, I won't make this mistake again.
Want to use code from this post? Check out the license.
Reader Comments
I would disagree here :) What you have described has been one of the biggest screw ups in member functions IMO. Adobe and Lucee both fell into this trap and it's drove me crazy the number of times it's bitten me. The issue is, people assume that
VariableOfCertainType.typeSpecificMemberFunction()
is the same as
typeSpecificFunction( VariableOfCertainType )
i.e.
myString.len()
is the same as
len( myString )
Sadly that's not the case as you found as the compiler doesn't unpack the member function to use the corresponding BIF in the bytecode, but instead at the low level Java objects, the member functions have been added there. The problem is that CFML is loosely typed and while len() will accept any data type that can be successfully converted to a string, the len() member function ONLY exists on actual specific strings. Give it an integer that came back from a DB query and boom! There is no way to effectively code against this as isSimpleValue() will say "true" which really makes me mad from a poor language design standpoint.
Adobe improved this in 2018 (yay Adobe!) by ensuring string member functions will work on booleans and numbers, but Lucee is still trailing in that area, and both engines won't let you do struct or array member functions unless the object you're dealing with is a real live first class CFML struct or array. That means, any Java lib that gives you a HashMap or ArrayList and suddenly your code breaks even though it looks like an array, smells like an array, and talks like an array. Other loosely typed languages like JS don't have this issue like this as they have no corresponding headless functions and they have much much fewer possible datat ypes. An object or array in JS is simply an object or array with no such thing as vast numbers of subclasses implementing shared interfaces like Java has.
You can't imagine the hours I have wasted arguing with the engineers on both sides that based on CFML's loose typing and "convert-on-the-fly-as-necessary" behavior, member functions need to work the same as their headless counterparts, converting the object as necessary so they always work. I haven't been 100% successful in this however, which unfortunately leaves member functions with a giant asterisk next to them any time you deal with data coming into a UDF that you don't know where it came from. Is that array really a CFML array? is that struct really a CFML struct? There's no way to tell and if it's not, your code won't work. Makes me so sad :sad-panda:
Oh, I totally forgot to add, my typical insurance against this sort of thing is to run them through a BIF as you suggested which will return a "real" CF object. Annoying, but generally effective so long as you don't need to go N-levels deep.
myStruct = {}.append( myJavaHashMap )
myArray = [].append( myJavaArrayList, 1 )
myString = trim( myJavaDouble )
or
myString = myJavaDouble & ''
@Brad,
I definitely share your frustration. In a perfect world, it would definitely "just work". But, that seems like a massive functionality-gap to overcome. That said, the people who write the Lucee platform know way more than I do; so, what seems like a large technical problem to Me may just be more of a philosophical problem to them (as it sounds like it might be from what you are saying).
It's a strange place to be. On the one hand, I've always loved how loosely typed ColdFusion is; but, on the other hand, I also appreciate that it has moved a little closer to stronger types (like
null
support and better JSON support). I think trying to straddle both those worlds is a sticky situation.The goods news, in the vast majority of cases, I do know where the data came from and I am able to use the member methods, which I enjoy much more than the BIFs.
Its also worth noting that there is a performance impact using these member functions. Some more than others. For instance, doing something like request.keyExists( "somekey" ) is more expensive than mystruct.keyExists( "somekey" ) and both are more expensive than StructKeyExists().
This is true of both memory use and execution time.
For core framework code, I have reverted to trying to remember to use BIFs wherever possible. Member functions are 'nice', but I kinda wished they were never there (not without a complete language rewrite).
@Dominic,
Whoaaaa! Can that really be true? That seems counter-intuitive :( Do you know if that is documented / recorded anywhere? Not that I want to over-optimize or anything (can it really make thaaaaat much of a difference); but, it would be interesting to see any discussions that took place about it.
It is indeed true. I found this when doing a lot of debugging. For the most part, it won't make much difference. Just when you're doing a big old loop or something like that is used by lots of other logic.
I believe I made a PR to Coldbox to do some optimizations for this very thing.
But yes, the reason it is slower is that it needs to do reflection to figure out what the method is. It is also needs to detect whether or not there is a key in your struct (with struct member functions) that matches the name you are using. e.g. the following is valid and you'd want to execute this custom function whenever doing request.keyExists():
request.keyExists = function(){ customCode };
We may argue that it shouldn't need to use reflection here and there could be another architecture that would work. BUT this is how it is working nonetheless (and I wouldn't be confident to suggest that there was indeed another way without a massive rewrite).
@Dominic,
Oh very interesting. It didn't even occur to me that you could have a key that collided with a built-in member function. I guess I just assumed that those keys would be invalid. But, yeah, I guess that could be problematic, especially with regard to backwards compatibility.
This is great to know. I'll keep with the member-functions for most code (don't want to start micro-optimizing). But, if I do find that some piece of code represents a hot bottleneck of processing, this could be one step that I take in making it faster.
@All,
Over the weekend, I ran into another meta-style programming issue - well, at least an issue with my mental model. I had assumed that
structCopy()
would return a native struct; so, I was using it to try an coerce a non-struct value (a ColdFusion Component) into a native Struct data-type so that I could use the Struct member-methods. However, if you callstructCopy()
on a ColdFusion component, you get back a ColdFusion component:www.bennadel.com/blog/3764-structcopy-does-not-necessarily-return-a-native-struct-in-lucee-cfml-5-3-3-62.htm
I have to must make a big neon flashing sign for myself: If you are using reflection style programming, do not use member-methoss!