Quick Thought On OOP Data Validation And Why Redundancy Is OK
Ever since my revelation about "valid" domain objects, my mind has been mulling over hot, steamy thoughts on data validation. If we think of domain objects as "Data Types," then they have to do some minimal amount internal validation just to make sure that they have the required information to exist in a valid state. For example, an Account object might check within its constructor to see that its assigned AccountNumber has a length because it wouldn't make sense for an Account to exist without an AccountNumber. It wouldn't check, however, to see if the AccountNumber was valid within the greater system as it might not know what those rules are.
So this got me thinking about the code that instantiates an Account object. Because the Account object does some critical data validation, we have to remember that it will throw an exception if passed-in parameters are not valid. This is both the required and the expected behavior - like trying to instantiate an INT data type with a STRING value - it simply can't be done. As such, in order to prevent exceptions, the code that instantiates an Account object has to also do validation on the Account parameters to make sure that they won't be invalid for Account creation.
Therefore, to handle Account creation gracefully, we have both the calling code and the Account object constructor checking to see if AccountNumber has a length. At first, I thought this was a duplication of logic; I wondered how I could factor this logic check out into a single place. But then, I realized that there wasn't really any redundancy going on. Yes, the same checks are being performed in two different places, but the intent of those checks is different, and I believe that this means that the logic is not duplicated.
To explain further, the logic check within the Account constructor is done with the intent to make sure that the data type "Account" is valid. The logic check within the calling code, on the other hand, is done with the intent to make sure that the data provided can be used to create an Account. While the difference might seem subtle, it's actually quite large. The calling code does not want to throw an exception because it is most likely part of a larger, user-driven work flow. The Account, on the other hand, couldn't care less about work flow or whether or not a raised exception would be a bad idea. Furthermore, the Account data type might be used in many situations within a single applications and cannot depend on any particular calling code to exist. For example, you might be creating Accounts based on a database query. The calling code might expect the database query to contain only valid data and therefore does not do any preliminary validation. But, the Accounts it tries to create based on that query data still have to perform critical validation as we can't have Accounts objects being created if the database data is corrupted.
All to say, seemingly redundant data validation in an object oriented work flow is not really redundant as the intent behind the validation is different depending on where it is being performed.
Reader Comments
Hi Ben,
I see where you're coming from, and I know it's hard to see these objects as "data types". One thing that helped me decide not to make my objects validate themselves was a simple thought I had about the DollarFormat() function in CF. Although this is not the same type of object that we create, I asked myself does DollarFormat() validate the data that I pass into it before it process the code? Or, do i make sure that I pass it a numeric value? We know that DollarFormat() always need a numeric/decimal/etc... value, we will never pass it a string or a bolean, etc... So why not do the same with our objects? Maybe we grab some data out of the database, instantiate the object, and let our object worry about business behaviors and that sort of thing.
Anyhow, it's just a thought, it's what helped me get over the mental hurdle that challenged me.
@Hatem,
I am not sure if we are saying the same thing or not? From what I gather, we are since using DollarFormat() requires validation *both* in the calling code and in the internals of the DollarFormat() method itself. To me, this is exactly what I'm saying about calling an object constructor.
Our calling code might check to see if the given value is between a given range, maybe $0 and $1,000,000... that's the business logic. But, when we call the DollarFormat() function, it does additional checking to see if the given value is numeric and if it is in the range of valid numbers it can perform operations on (can't go into the billions for example before throwing INT errors).
@Ben,
"Our calling code might check to see if the given value is between a given range, maybe $0 and $1,000,000... that's the business logic"
This type of validation is fine (for me anyway) to exist in the object since it is business logic. I guess what I was thinking when I heard the word validation was that it was validating a string, int, etc...
Ben,
Often, a form will be the energizing agent for creating a domain object. This is why I recommend having a "Form" class that has methods like: display(), populate(), and validate(). This has a lot of value and shows how objects can be used for simple but very useful things.
You can see in your example that an Account would need to protect itself against one level of badness -- it's expecting a numeric value and someone passed in an array -- but it can't be expected to know whether or not a numeric value is actually a duplicate of another account.
@Hal,
So where would you check to see if the account number is a duplicate of another account, assuming that someone submitted a new form?
@Hal,
While I have only thought about this Form object in mind, I do believe that there is a lot to it. Dan Wilson tried to explain this to me a long time back and I was much opposed to it at the time. But since then, I have come mostly full circle (or is that 180).
Your form class needs to validate its data. Typically, it will delegate this to another class that knows how to validate stuff. You might have multiple validation classes. One of these guys will know how to check to see if the account number is valid, then return the results to the Form object that can then return a nicely formatted error to the form page.
@Hal,
Right! I think one of the nicest things about the Form object idea is that it can pass back form-specific error messages such that I don't have to concern the client with converting data-based error messages into user-friendly error messages.
It's very powerful, Ben. Depending on what/if any framework you're using, you can have the form post directly to the Form object's populate method, which will then call its validate method.
@Hal,
Word up :)
Just to be ornery, I'd argue that there's no such thing as "duplication of [validation] logic".
If the whole point of OOP is that you can take out one black box component and drop another in its place, or take a black box component and reuse it someplace else, then each and every object should be as paranoid as possible for what it accepts. If that means you perform the same check 5 times as you traverse the object chain, then so be it.
As long as the paranoia fits the scope of the object, then keep it in. (That is, if you have a function that hyphenates the account number, then while checking the length would be a valid check, checking to make sure it is a valid account number probably isn't.)
(Yes, yes, yes, I know you can make a counter argument to this where the validation is some expensive operation, but that's an edge case and an optimization problem, not a structure problem.)
@Rick,
My object aren't paranoid... who told you that? Did the CIA tell you that? What are they saying about me? What did you tell them? Can I trust you?
I've been writing ActionScript 3.0 almost exclusively for the past year, with Cairngorm MVC, so I am totally drinkin OO Kool-Aid. I do no have much experience with CF OOP, so I might be totally off here, but, if you are simply validating the data-types of the properties in your constructor, why don't you strict data-type the input parameters of your constructor? In AS3, I would do this:
function Account(name:String, number:int, typeId:int, ownerName:String)
{
accountName = name;
accountNumber = number;
accountTypeId = typeId;
accountOwnerName = ownerName;
}
No validation necessary. Can you do something similar in CF?
@Eric,
It depends on what you mean an Account to represent. Is it valid of an account to have an empty string Number? Or, since you are using INTs, is it valid for an account to have a negative number or Zero?
Some of this will make sense to check in the calling page work flow (more business logic); some of it will make sense to check in the constructor for data type violations.
function Account(name:String, number:int, typeId:int, ownerName:String)
{
if (number < 1){
throw new Error( "InvalidNumber: Must be positive integer." );
}
accountName = name;
accountNumber = number;
accountTypeId = typeId;
accountOwnerName = ownerName;
}
... I am not an AS3 person, but something like that.
Well, I went through a similar dilemma in designing my value objects in my Flex applications, and I decided as a general rule NOT to perform validation in the constructor, but, instead to have any additional - not front-end - validation be performed in a separate command class where the Account is created, and perhaps added to the Model - i.e. "CreateAccount". This works very well for me, keeping different types of logic separated, and keeping value objects simple - usually just properties. I feel this is a better practices for code portability and code readability.
@Eric,
I am not sure where you get your data from, so after you get your data from its persisted source (XML file, Web Service, etc.), that's when you use the CreateAccount command to instantiate and populate the object?
Or, are you talking only about dealing with user-entered data?
Also, I am not sure how this technique is related to portability and readability? Can you explain further?
@Rick: Preach it, brother!
I follow this practice both for instantiating Objects from data returned by WebServices, and for instantiating Objects on-the-fly based on user input.
For instance, I have a UI which allows a user to create a new set of report definitions and then run that report. As they create the Report using a step-by-step process, I run several pieces of validation - i.e. validating that percentages they enter on one step add up to 100. When they click the "Save Report Definitions" button, I call a command that creates a "Report" instance and validates things such as the name they gave the report is not already in use. I then call a WebService to "save" the report definitions to the database. The next time they load the application, that report definition is returned by a WebService, and this also creates a Report instance, but uses a different command, since no validation is necessary.
Also, in some cases, I create an instance of an Object with no data in it, and add data to it on the fly, either by calling commands to do so, or by simply setting a single property. By default, I write all my Objects with no validation in the constructor, so that I have this flexibility. I guess it depends on the type of application you are creating, and the platform which it is on, but I'm just sharing my general practices for OOP to give you food for thought....
@Eric,
So you use a different command for Report creation based on whether it comes back from the web service vs. whether it comes from the user-drive form?
@Ben: I guess I have to disagree with your logic. It *is* duplicated logic, and the problem will come in when you have to change that logic.
For example, your company acquires a new subsidiary, and their account numbers are a different length. Now you have to find all the locations you are validating your data -- two places! -- and change them. Or subclass Account to two new ones, legacyAccount and newWhizbangyAccount. Even though you can now pass an Account object around, you have validation that wants an account number of a specific length, scattered around your code. You now either throw away the one validation, or you have to know you are passing a specific account type.
For me, duplicated logic is the same as duplicated data -- it can cause problems, and you need to have a good reason to allow it (usually efficiency).
@Tom,
I believe you misunderstood my example. I wouldn't be checking max length in the Account class itself because the Account class doesn't have any concept of how long an account number can be in the application.
The only length check the Account class would check would be that there is any length at all since it cannot exist if it has no account number at all.
Therefore, if you had a new company that had different account number lengths, you would only need to change the check in the one place (the business logic).
I agree our Account Object should do common sense validation. The Account Object needs to stand on its own two feet, blind to the intentions of the thing that called it.
Very interesting reading.
I hope this doesn't sidetrack the discussion (this is a great topic Ben!), but I wanted to ask about returning errors in objects. Do your object methods throw errors? Or do all of your methods have some type of return logic that is consistent for each method? For example, let's say there is a method called getFoo() that returns some piece of data from a database. There is a problem with accessing the database, and the query fails. How would you return control to the code that called getFoo()? I can see two ways: 1) throw an error and make the caller of getFoo() catch it or 2) have getFoo() return a structure (or something like it) with keys indicating whether getFoo() succeeded or failed and the return data or the error condition.
My gut tells me every method except init() should return some value or structure. It gets back to the concept of an object as a black box.
Thoughts?
Thanks,
Marc
@Marc,
I think if you ask an object to take an action whether it be to initialize itself or some other behavior and it cannot perform this action as expected, it should throw an exception that may or may not be handled by the calling code (if not handled by the calling code, hopefully the Application itself has a top-level error handler).
If, on the other hand, you ask an object IF it can perform an action, then in that case, I think it would return some sort of error collection.
Then, I think there are times when you have an API layer whose behavior is to NEVER throw an error, but always return a unified response. For example, my remote APIs generally have the following return object:
{
Success = true,
Data = "",
Errors = {}
}
The idea behind this is that the API "request" is never supposed to fail, only whether or not is succeeded. So, my AJAX request can check:
response.SUCCESS
... to see if the API request was successful (in terms of the intent).
Now, going back to Hal's example of the Form object that holds data prior to Account creation, I think there would/should definitely be a Validate() method on the Form object that returns a collection of user-friendly error messages.
@Ben,
Thanks for your insight. So do you define the types of errors thrown by a method as part of some kind of system contract?
@Marc,
I am fairly new to this, so I don't have a set plan just yet. What I have been leaning towards is something like this:
ClassName.MethodName.Target.ErrorType
So, if I tried to create an Account with an invalid Account Number, I might throw this:
<cfthrow
type="Account.Init.Number.InvalidArgument"
message="The number parameter you provided is not valid."
detail="The number parameter you provided, #ARGUMENTS.Number#, is not valid. Account numbers must be integers greater than zero."
/>
That's what I've been leaning towards as a standard methodology.
Ending a session and starting a new
My end consumer is already logged in. Session.UserID
Placed and order, now they are done.
<cfquery datasource="Generic" name="PurchaseOrders">
SELECT *
FROM Orders
WHERE WeborderID = '#url.WebOrderID#'
</cfquery>
<CFSET getitbaby = Session.UserID>
<CFSET STRUCTCLEAR(APPLICATION)>
<CFSET STRUCTCLEAR(SESSION)>
<CFLOOP INDEX="x" LIST="#GetClientVariablesList()#">
<CFSET DELETED = DELETECLIENTVARIABLE("#x#")>
</CFLOOP>
<CFCOOKIE NAME="cfid" EXPIRES="NOW">
<CFCOOKIE NAME="cftoken" EXPIRES="NOW">
<CFCOOKIE NAME="cfglobals" EXPIRES="NOW">
<CFSET Session.UserID = getitbaby>
This works if only they click the button I created for them.
<cfquery datasource="Generic" name="thechecker">
SELECT *
FROM Loger
WHERE ID = #Session.userID#
</cfquery>
<cfif thechecker.Admin eq 1>
<cfinclude template="includes/Header1.cfm">
<td width="693" align="left" valign="middle">
<table align="center" border="0" bordercolor="336699" cellpadding="3">
<tr>
<td>
<cfform action="admin.cfm">
Your Order is complete
<input type="hidden" name="theid" value="#Session.UserID#">
<input type="submit" value="Continue With This Program" style="font-family:Verdana; font-size:10px; font-weight:bold;">
</cfform>
Any way around this?
Exploring :-)
@Ben
The problem with throwing errors like that is that you can't generically catch them.
Account.Init.Number.InvalidArgument
With this you can cfcatch Account errors, or Account.Init errors, or even Account.Init.Number errors, but you can't catch InvalidArgument errors generically.
CF can catch errors based on dot notation, so you'd be better off doing it in reverse.
InvalidArgument.Number.Account.Init
Now you can cfcatch type="InvalidArgument" and catch errors from anywhere in the call chain, instead of needing to explicitly catch each one.
Not sure why you want the class name and method name in there though. That'll be in the stacktrace. :)
@Bruce,
When clearing the SESSION scope, I would suggest making a temp copy of the CFID and CFTOKEN values as they are not automatically re-populated into the SESSION after the StructClear().
@Elliott,
Ahh, good point. I hadn't thought about catching specific types of errors. To be honest, when it comes to catching errors, I generally always do a generic catch. I suppose you are absolutely correct - generic type and then I can put the specifics in the Message / Detail.
hum... this is sort of back to the original post idea of redundant validation, but also shows how we have tackled error/sucess/warning feedback. i definitely think each object should be a blackbox (imho) and not rely on anything outside to function. we have grappled with how to do that consistently and on large scale and have finally accepted and implemented the idea of using a metadata repository to help us systemize validation/encryption/advanced validation/glossaries/et cetera.
We have moved to incorporating a generic DAO coupled with a DataDictionary (DD) that contains metadata about field types so that we can consistently apply both basic and advanced validation. It, along with another wrapper that builds a display widget for the form type (ajax based, and it also handles concurrency issues just like Blaze/LiveDS does) , takes care of end user validation (at the form entry level) and server side validation at the db commit. It handles custom formatting (phones/ssn/accounts whatever) as well as de/encryption for secure fields, passes along end user help and error messages from the dd glossary. They also (all our functions/methods actually) always pass back a structure that includes status (success/failure/warning) and end user messaging which can then be displayed to end user if relevant. sometimes they throw, but usually only in true error situations, otherwise they *fail* but return detailed structure.
The meta information is abstracted enough that ANY process not going through the DAO (which are few) can still easily apply the same rules, validation and formatting pertaining by asking the DD for help on what to do.
For complex objects or process points (let's say at account creation) a third wrapper (business function) would validate complex biz logic and relationships (like if this value exists, then these other values must exist as well) before talking to the DS/DAO (which perform the other validations and feed this back to the business function on fail/warning). This top level biz logic is either contained in the complex object and/or in a workflow system.
In short, the DD approach has helped us immensely in making sure our blackbox strata always apply the same basic/advanced validation in a systematic and scalable way. Objects don't worry about (or trust) other objects validation (they are paranoid). Our coding standards tell us how to consistently feed success/failure info through the objects so they can communicate effectively.
@William,
The data dictionary approach seems very interesting. At cf.Objective() I attended Bob Silverburg's presentation on his ValidateThis framework, which I think uses some similar philosophies - using external data definitions to create both server-side and client-side validation frameworks.
I like this idea and am still trying to wrap my head around it (visually). Seems very powerful.