Why NULL Values Should Not Be Used in a Database Unless Required
When you set up a database (at least in MS SQL Server) you can flag a field as allowing NULL values and which default values to take. If you look through people's DB structures, you'll see that a lot of people allow NULL values in their database. This is a very bad idea. I would recommend never allowing NULL values unless the field can logically have a NULL value (and even this I find this only really happens in DATE/TIME fields).
NULL values cause several problems. For starters, NULL values are not the same as data values. A NULL value is basically an undefined values. On the ColdFusion end, this is not terrible as NULL values come across as empty strings (for the most part). But in SQL, NULL and empty string are very different and act very differently. Take the following data table for example:
id name
---------------
1 Ben
2 Jim
3 Simon
4 <NULL>
5 <NULL>
6 Ye
7
8
9 Dave
10
This table has some empty strings (id: 7, 8, 10) and some NULL values (id: 4, 5). To see how these behave differently, look at the following query where we are trying to find the number of fields that do not have values:
SELECT
(
SELECT
COUNT( * )
FROM
test t
WHERE
LEN( t.name ) = 0
) AS len_count,
(
SELECT
COUNT( * )
FROM
test t
WHERE
t.name IS NULL
) AS null_count,
(
SELECT
COUNT( * )
FROM
test t
WHERE
t.name NOT LIKE '_%'
) AS like_count,
(
SELECT
COUNT( * )
FROM
test t
WHERE
t.name IS NULL
OR
t.name NOT LIKE '_%'
) AS combo_count
This returns the following record:
LEN Count: 3
NULL Count: 2
LIKE Count: 3
Combo Count: 5
We were looking for 5 as records 4, 5, 7, 8, and 10 do not have values in them. However, you can see that only one attempt returned 5. This is because while a NULL value does NOT have a length, it is not a data type that makes sense with length. How can nothing have or not have a length? It's like asking "What does that math equation smell like?" You can't make comparisons like that.
So, allowing NULL values makes you work extra hard to get the kind of data you are looking for. From a related angle, allowing NULL values reduces your convictions about the data in your database. You can never quite be sure if a value exists or not. Does that make you feel safe and comfortable when programming?
Furthermore, while running LEN() on a NULL value doesn't act as you might think it to, it also does NOT throw an error. This will make debugging your code even harder if you do not understand the difference between NULL values and data values.
Bottom line: DO NOT ALLOW NULL VALUES unless absolutely necessary. You will only be making things harder for yourself.
Want to use code from this post? Check out the license.
Reader Comments
Ben - If you feel nulls should be rarely used, what about these types of columns?
per_middle_initial -
per_fax_number
per_name_suffix
I could list several other examples where I've allowed nulls. I've designed hundreds of tables and believe allowing null values is perfectly valid where appropriate.
Based on a business rule there is a legitimate reason that the column will not have a value for some entities.
Bruce,
While, I agree that for business rules, there are legitimate reasons to have NULL values, I am not convinced by your examples. You have a field like "middle_initial" that can have NULL values. So, basically you are saying that the user is not allowed to leave that blank. Or rather, a user can leave that blank, but you are saying in your business rules that blank answers are not stored... but are instead converted to no answer.
Look at the code I posted above... what would happen if I was a user who came to you and said "I want to see all middle initials that have no length. You turn around and "No problem, I will find you all the middle initialials that are NULL. Then the user says "Hey loser, I didn't say 'NULL', I said the ones that have no length. So you say, "Yes, but non-length fields are being stored as NULL". So the user says to you "But I didn't store a NULL value, I stored an empty value, who the hell are you to tell me that I can't store a empty string?!?!?"
Of course that is a retarded example. And it is meant to be retarded to point out the craziness over debating the "philosophy" behind using NULL values vs. using empty strings.
My POINT is that using NULLs makes things less convenient! Just look at the query above. NULLs complicate length checks for length and for LIKE'ness. Is that worth it? Is it worth having to check for NULL and for length just so you can say that "philosophically" you used the correct data storage??? That is up to you... I personally say that it is not worth it.. but to each his own.
Ben:
You are missing a critical step in your discussion of null values: validating user input.
You can either set up the database so that there is a constraint (SQL SERVER) to not allow an empty string to be stored in the per_middle_initial field or you could set up a validation check in your application that would ensure only null was stored in per_middle_initial even if the user entered an empty string.
Your example's problem is NOT because of null, but because of poor design that allowed empty strings to be placed into the field.
If you use proper validation checks to ensure only null is stored in the field, even if the user enters an empty string, your query problem is solved.
Bruce
@Bruce,
I disagree... well rather, I don't disagree that validating user input is essential, that is true. By setting up a constraint that stores NULL values or altering your INSERT statements still does not change the fact that by allowing NULL values means that the data in the column may have two different behaviors.
Think about it from this point of view: you architect the most awesome database tables and empty fields are stored as NULL values. All is good. But then a client calls up frantic and furious, "Bruce! I we've fired all of our data entry people! I need to know all user accounts in the database that do not yet have a SSN stored... and I need it YESTERDAY!"
"I'm sorry, Bruce is out of the office today at a ColdFusion conference, I will put Ben on the job!" ... Ben, do you know how to do this?
"Sure thing boss, I will just grab all records whose SSN field has a ZERO length."
"Perfect, another client is satisfied, we have saved the day".
Little do I know that I am about to pass to the client a VERY incomplete record set as I will not be returning any records whose SSN number is NULL, just whose length is ZERO... which, depending on your constraints is not going to return any records at all.
The solution? Fire Ben???
Where is the problem here? Maybe the problem is that Ben (me) does not understand the database architecture? Yes, that could be a problem.
But, I think the real problem here is that at a CONCEPTUAL level, NULL values don't have lengths. It doesn't have a value, how can it have a length? But, of course, that does not jive with the IMPLEMENTATION level that says NULL values are NOT values and therefore, length does not apply.
Now, PLEASE DO NOT GET ME WRONG... I am not saying that we should throw out our understanding of the implementation in order to cater to those who do not understand it.... what I am arguing is that the BENEFITS of catering to the CONCEPTUAL level of understanding out weigh, philosophical principles of NULL vs. Value.
And of course, at the end of the day, to each his own.
Ben,
I disagree about the usage of NULL values in a database. NULL values are an important factor in DB design. Actually you should not allow to store empty strings in a varchar (char, text, etc) field.
As you said, a NULL value means "does not exist". So, if I do not have a middle name, a middle initial for me does not exist. A NULL value would be the proper represantation here, whereas en empty string would be value, and it would be a wrong one.
Sure, CF turns NULL values into empty strings, but, that does not mean we should use these values within the database. The <cfqueryparam> tag offers all we need to handle this:
INSERT INTO myTable (myField)
VALUES (<cfqueryparam cfsqltype="cf_sql_varchar" value="#myCFVar#" NULL="#Not len(myCFVar)">)
Databases (and all their topics, from relational design to NULL values to constraints, etc.) are a very complex field. A good database administrator has most likely spend years gathering his knowledge and enhancing his abilities. And if Mr. X does not know how to handle NULL values, that means we should not let him handle a production database without supervision. It does not mean we should adjust our database to his knowledge. ;-)
Chris
@Chris,
I completely understand what you are saying (and what many others have said). I just do not see the benefit of the NULL value. What higher purpose does it server? What are the advantages of having a NULL value in a varchar field rather than an empty string.
I would say that in 99% of cases, there is no advantage. Sure, you could tell me that we can tell the difference between an empty value and "non-submitted" value.... BUT, I would say in our web applications, how many times is that required OR even relevant?
I can see the advantage (which I have stated many times before) of NOT having NULL values. Can you please tell me the advantage in a real-world way of having NULL values.
Storing a NULL value should save space over storing a string of zero length. I am not completely sure as to how much space will be saved, but in my database program I am trying to use null values instead of strings of length 0 to save space, and cause null pointer exceptions if data is read from the database and not checked properly. Data should be read, and then verified before used, which the previous author of the software I am working on never did.
But really, I do it to save space.
I know this debate is long over, but i thought i'd just throw in my two cents.
Chris - just so we're clear - NULL does not mean "doesn't exist". It means UNKNOWN.
If you have a middle name that is left blank by a user that means he knows what his middle name is... it's nothing - i.e. blank.
Inserting NULL is saying it's UNKNOWN.
NULL = UNKNOWN
BLANK = KNOWN
I'm currently in a debate with a Business Analyst about just this issue. To me there is no good reason to store SQL NULL.
There is a good reason to store null, as in the literal meaning "nothing", blank, or a boolean\bit value.
But SQL NULL's are not necessary.
It's just bad design.
Over and out... since this was a google search i will more than likely not see any update after this one so no point replying to my comment directly to me :)
I think null is very valuable in a lot of cases where you want to know if something is unknwon (i.e. data was not collected) or if it is really empty. For instance, suppose we have a contact form with firstname, lastname and emailaddress. We use it for a year, then the boss asks us to add middlename. Now, from this point on, if our users don't enter anything for middle name, they will have an empty string recorded. But for all the existing data, it would not be correct to assume that those people don't have middle names. We just don't have that data - hence null.
But more importantly, you only seem to be talking about strings. What about numbers? In sql server there is no numeric equivalent of an empty string. So if I don't allow nulls, the best I can do is zero. But what if I am collecting stock prices and reporting them on the hour. If I have no price for the hour, is it reasonable to return zero? I don't think so. Or if you are collecting anaytic data, test results, or any other number of things, zero is NOT the same as unknown.
I think in respone to this statement:
"Maybe the problem is that Ben (me) does not understand the database architecture?"
The only reasonable response is - don't ask people who don't understand SQL to run your reports!
@Ian,
Agreed - there are times when NULL values should be used. Your cost as zero example is a good one. So are things like date-sold which should be NULL if is not an actual date.
And yes, if is is important that you know if a middle name is collected yet or now. My argument is that most of the time, people only look at NULL from a theoretical standpoint and don't take a step back and think "If that string is empty instead of NULL, will it have any impact on my ability to carry on my business logic".... I am saying that if the answer is NO, then a NULL adds more overhead than it is worth from a practical standpoint.
Null means nothing. It's empty. Previous comments have tried to pin it down to a reason why it's empty, this is absurd. There are no conclusions you can draw from a null except that no information was provided, you can't say it wasn't provided because it wasn't known, or it wasn't provided because it doesn't exist. It just wasn't provided.
Now that being said... what extra value can an empty string provide above that of a null? It gives you two empties instead of one. The meaning of each is still arbitrary and can vary from system to system. As one poster said, you can add a field and give it a default of empty string, then use the null vs. empty string to determine if the record was there before the field was added or not... but is this a standard? No. If you bring in some trainee fresh out of college is he gonna look at it and say "oh yeah, null means that field didn't exist when the record was added!" No. This person assigned those uses for that particular system and they apply only to that system.
For this reason, I think the original article is dead-on in that there should only be one 'nothing' and not two to be arbitrarily assigned different meaning in different systems. However, I think he has the wrong one. Null wins, empty string loses. Why? Because null works on all data types, not just strings. Null is necessary. You cannot get rid of it, unless you make all fields of all types required. Empty string is unnecessary. It was added for flexibility only.
Well, I mostly agree with your last assesment - that if it is a question of no nulls or no empties, nulls are way better than empties. But I still stand by the difference between known versus empty. It is true that when we see a null we don't know whether ther is or is not a real value lurking out there. But we know that at the time the record was created, the PROGRAM didn't "know" - that is to say, we know that a user did not submit an empty string - if they had, there would be one in the system. So either they weren't asked, or there was a malfunction. This assumes that we pass empty strings for all form fields that are left blank, which I personally think is only useful in some situations, but in a case where we are trying to determine what data was submitted, we will want to store all values, even empty ones. So if we define our table to allow nulls and to have no defaults, then we know that if a field is null, it was not submitted. In the case of the trainee, she may not know the history of the data, but if she should know right away that because there are nulls in a field, she'd better disclaim her results.
@Alex,
So, are you saying that all of my CFQueryParam tags should be of this nature:
<cfqueryparam
value="#FORM.first_name#"
cfsqltype="CF_SQL_VARCHAR"
null="#NOT Len( FORM.first_name )#"
/>
From what it sounds like, you are suggesting that pretty much all of my CFQueryParam tags should make use of the "null" attribute not matter what?
To Ian, yes I agree that's pretty much the only circumstance that calls for using an empty string. However, those sorts of questions should generally be answered by the time an application goes production. ("Generally" being the key word there.)
To Ben, I'm not very familiar with Cold Fusion, but it looks about right to me, in fact you might even take it a step further and add a trim on the null condition to ensure you don't get emtpy space saved even if it is greater than zero length. Whether or not that's a good idea also depends on the circumstances, that being said I suppose there's an exception for every rule. In general though, nulls cannot be avoided unless you make every non-string field required.
I am running this query in CF
<CFQUERY Name="GetCompanyName" datasource="DS123">
SELECT CompanyName, date_approved
FROM CompanyProfile
WHERE CompanyID = #Val(CompanyID)# and
date_approved is not null and
CompanyName is not null
</CFQUERY>
But this query is still returning CompanyName(s) that have [blank] value. I do not want companies with blank nor Null values.
You can test drive my Beta site -
visit this link http://www.naperlist.com/directory/search_main.cfm and press search w/o selecting anything. You will see some companyName that are blank. I do not want to display these companies that are blank, nor Null - is there away I can fix this with CF. Please advise.
Thank you so much for any input you may have.
-AJ
Hi AJ,
that problem arises from putting blank values where a NULL should have been... you must check against both options:
<CFQUERY Name="GetCompanyName" datasource="DS123">
SELECT CompanyName, date_approved
FROM CompanyProfile
WHERE CompanyID = #VAL(CompanyID)# and
date_approved is not null and
CompanyName is not null
AND CompanyName != ''
</CFQUERY>
A few tips on the code, if you don't mind...
I would recommend putting the data source name in a variable and setting that in a central location. This way, if you have to transfer your app or simply use another data source, you do not need to change 100s of pages:
<cfquery.... datsource="#myDSN#">
VAL() is not necessary, if Company contains a numeric value:
WHERE CompanyID = #CompanyID#
It is considered best practice to use <cfqueryparam>. Not only does it make your code more secure (prevents SQL injection attacks because it evaluates the parameter type), it also speeds up query execution. (Although I must admit that with a small query like this, performance should not show much difference):
WHERE CompanyID = <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="#CompanyID#" />
Last, but not least, I alway recommend to write the SQL keywords all UPPER case and only put 1 statement on 1 line, to improve readability. That includes ANDs and ORs for me ;-)
SELECT ..
FROM ...
WHERE ...
AND ...
ORDER BY...
HTH,
Chris
@AJ,
You can check the length of the field:
CompanyName is not null
AND
CompanyName LIKE '_%'
The LIKE '_%' ensures that there is at least one character. You may not even need the NULL statement in conjunction with this.
Ooops, I didn't see Chris's post... he has some good stuff there. He also compares to the empty string != '', which is good - something that I forget to do occasionally.
I hate to be so snarky but if you are one to confuse the empty string with NULL, even the slightest little bit, then please, for the love of God, don't go anywhere near databases, especially as a developer.
2¢
The logic of not using NULL because you need extra work in your query is the worse logic I ever heard. And when thinking that your extra work is only one extra line that is ridiculous. Sorry dude but there is a reason I think for NULL to be there and if it is there its purpose is to be used.
Using your same logic I can give you hundred of examples that will make starters to think that it is bad to use int, char, nvarchar, uniqueidentifiers and so on.
@John,
NULL values do have a time and a place. AND, with every situation, there is a trade-off. My argument is that for MY purposes, most of the time, the existence of a NULL value only adds overhead RATHER than adding useful functionality. By all means, if the NULL value adds functionality, then use it.
My statement that NULL values are bad is not meant to be a blanket statement. On the contrary, I happen to love NULL values for things like date/time fields because to me, that is a really meaningful usage. However, with things like Last Name or First Name, they simply don't add any useful functionality - they only add the additional overhead / logic to your code.
So, I ask you this - if they serve no purpose to ME and MY situation, would it not be foolish to keep them in just because NULL was invented for a reason???
Hello Ben, Chris & Everyone -
I appreciate your valuable input. My website is up & running now. After reading your comments, I decided not to use Null altogether, which really helped me with my coding. In my situation, when a user creates an account for Classifieds, they have an option to add their company info as well (see this link http://www.naperlist.com/accountcreate.cfm ) if they choose not to, then the company name must use the value NA. However, if they deleted NA & kept the field blank, then behind the seen I check for Blank, if true, I insert NA in the company field. As something as simple as this, it did save me time & the headaches as far as Null value is concerned.
FYI - I am not a CF developer, I've only used it for about 2yrs, I am actually a Sr. Oracle DBA, I have been in the field for about 12yrs now, but I really felt in love with CF the day I put my hands on it.
By the way, please visit my website when you have chance, I would welcome any suggestion you may have, I'd appreciate that..
I will certainly come back to this cool Forum anytime I have CF questions.
Thank you all!!
-AJ
AJ,
The problem is your site is broken for people who work for a company named NA, or perhaps Northern Air or Norton-Arnold. If this field can be left unpopulated, then you should use NULL for the back-end representation.
Jack
While I disagree with the use of NA, I would say use NULL only if you can have a valid company name that is blank. Unless of course, you just want to use NULL.
Not quite Jack. When you create an account thru my site, the company info is Optional, therefore, when someone do a search for a particular item, all companies with NA are excluded from the search, no companies with NA will get displayed in the search results.
The way I setup my system is that a member can only create an account for Classifieds OR he/she can create a Classified Ad as well as a Biz profile. The goal is to provide Classifieds for free, and sometime in the future, i will need to charge a reasonable fee for the directory profiles - Someone has to pay for the expenses afterall :-)
Give it a try to see what I mean:
http://www.naperlist.com/directory/search_main.cfm
Hope this clarifies your question. Any other suggestions, please post them online.
Thank you.
-AJ
One more thing Jack.
How many companies within my community that are called NA? no idea. If that is the case, they will need to spell out the whole name. Or they can call it NA Inc.
Actually this happened to me - I wanted to create a Yahoo account with the word NETWORK, guess what, I got an error messge that I can't use the work NETWORK. I am sure there are lots of companies that use the word Network (i.e. NetworkSolutions) - But did Yahoo give a damn about that, NO! They had to do what needs to be done to keep their system clean (no special word conflict) or for whatever other reasons..
Therefore, I chose a different name and life goes on.. :-)
-AJ
Or, you could just use NULL, for the exact purpose it was intended.
One of the main reasons against NULL values and a problem we are dealing with at my work right now is the fact that NULL values are not included in INDEXES. So if you have a small percentage of your table where a column has a NULL value and you have an index on that column, the optimizer will not use that index - it will force a full table scan, degrading performance considerably. I know this definitely applies to Oracle DBs - not so sure about the others though.
Hi, everyone.
Although it's an old topic. I have some thoughts which I'd like to share with you.
First of all, a database is only "a database". Data modeling and database design should not be so much involed with presentation technologies. I know some web plaftform supports direct database manipulation. But in my opinion, that's really for fast prototyping purpose. A full feathered production, or system, always is a distributed system. For example, SOA call for design the inferface according to message requirement, that is the data contract. A database always need to serve many customer, ofcourse, it doesn't directly exposed it's schema, but use a middleware instead.
I give a scenario I'm facing to currently.
A database has a custom table. Firstname and Lastname are two columns. It separatly "John" and "Smith" for custom "John Smith". You need to show "Smith, John" for service A. But I'm a chinese. My name is "Baoshan Sheng", "Sheng" is my Lastname. For service B, if my name is displayed as "Sheng, Baoshan", I feel the company doesn't respect me very well. ( The same as "Hu Jintao", not "Hu, Jintao" ).
How to handle this scenario?
You can add some algorithm to handle asian name convention, and ofcourse, a column "IsAsian".
But there're many different regions that our knowledge doesn't cover yet. So I decide to add a column name "RealGeneralName" to directly stored the "dictionary order" name spelling.
For service A, we can use SELECT ISNULL(RealGeneralName, Lastname + ', ' + Firstname) AS GeneralName if we store a NULL in RealGeneralName for most people whose name is regular. That means we hide the real schema to the service b's middleware. It's intuitive and it's performance is better than expose both the three column to the service middleware. (This may be controversial.)
For service B, e.g. an customer entry service(sales service?), we expose the three columns directly to our sales. Salers can handle different conditions.
If we don't allow NULLs, and use empty string instead. the SQLs will be more ugly - it'll no longer a scalar expression. Am I right?
That's a schema I designed at the current stage. I'm not sure I'll not change my design in the future. But I do like to share these thinkings with you. Comments are welcomed.
Baoshan Sheng. Pardom my english.
If you are storing contact details in a table and you have a field Fax for example, if your contact does not have a fax machine shouldn't you put NULL in this field?
@Paul
Hey Paul
When the user doesn't supply information it would be a lot smarter to have a generic fax number ex.) 000-000-0000 or however you format your number.
The reason being
Assume that you check to ensure that the number is valid. If you return a NULL value it's obviously not going to be valid. Oh course there are always ways around it but its best to stay consistent.
@Jody
I respectfully disagree. While it may be more convenient to have '000-000-0000' instead of a NULL, it's neither good nor correct.
NULL marks the absence of a value, so, if a user doen't provide a value, NULL is what to store in the database. '000-000-0000' on the other hand is a value, it generates data where the input didn't provide it.
I know that the implementation of NULLs in RDBMS is, well, less that optimal (especially if you look at aggreta funcitons), but, they exist for a reason. All other models would either have introduced additional complexity (like multiple types of NULL) or would have resulted in too little complexity (like omitting the NULLs altogher).
Plus, one more practical reason: how are you going to handle fax numbers that do not follow your formatting? ;-)
Cheers
Chris
Has no here heard of foreign key or unique constraints?
You get away with FKs and NOT NULL by forcing the foreign table to have a special reserved value which means "not there" but that's a horrible kludge that will keep on giving and giving.
However, without NULL you can't have have a "unique if it is there" constraint unless you want to write your own CHECK constraint.
Yeah I know I'm a couple years late to this discussion but these are important data integrity tools and they often won't work without NULL.
@Chris
I agree with you 100%. However, I was referring to a value that would be "REQUIRED". If you return a NULL value it didn't meet the REQUIRED attributes, right?
I'm not just referring to the FAX number alone. I'm referring to POSTED values, and/or any other types of values that you want to validate.
To answer your question about proper validation Coldfusion has a wonderful function called validate which is added to the input tag and you can set it to validate phone numbers. Simple enough, or you could also validate the number using IsValid in a CFIF statement.
@Eric,
I am not sure that I understand what you are saying. Any date column that could potentially be NULL would probably not be a good candidate for a uniqueness constraint.
Can you give an example of what you mean?
@Ben,
I'm not talking about dates in particular, I'm talking about NULL
in general.
For example: say you have a blog/article/page/... that can have a comment
thread but doesn't necessarily need one. In the database you
will have "blog.comment_thread" FK'ed to "comment_thread.id" and
"blog.comment_thread" will allow NULL values. You could reverse the FK
but then you'd have problems if you want to attach comment threads to
other things. Either you use NULLs or you use a special reserved value
(i.e. a kludge) or you neglect referential integrity.
How about a tree structure that needs to be stored in an SQL
database? You're going to need NULLs for the parent and/or child pointers
or you're going to have to use a special reserved value (i.e. a kludge)
or you neglect referential integrity.
How about a simple user or user-group based ACL. You could use one ACL
table for per-user access and another table for per-group access but that
can be cumbersome to work with; you could also represent ACL entries as
(id,type) pairs but you'd lose integrity checks unless you had an external
sanity checker or CHECK constraint that knew how to interpret the pair.
Here's an example for a UNIQUE constraint. Say your customer records allow
each customer to have an SSN (or SIN or email address or driver's license
number) but don't require it. Furthermore, you want the SSN to be unique
to avoid duplicates and data corruption. If you're scared of NULL then you
have to ensure the "unique if there" constraint outside the database or
by writing a CHECK constraint (and if someone is afraid of NULL then what
are the chances that they're going to write a proper CHECK constraint?).
I think someone else already mentioned the significant difference between
zero and NULL in numeric columns.
The point is simple: use NULL where it makes sense. Going through a pile
of kludges, contortions, and justifications just to avoid using NULL is
foolish. Dogma helps no one.
Most of what I'm saying is about data integrity, referential integrity
in particular. Data integrity is very important: fixing broken code is
easy, fixing broken data is a nightmare.
@Eric,
I'm not trying to say that NULL values should never be used. There are certainly times when they should and have to be used; a binary tree might be a great example of place where it should be used (although I'd probably argue that a parent node ID of zero would / should indicate no parent just as well as NULL would).
While at first glance, I can see how something like a column that must be unique requires NULL (assuming that NULL values can be repeated?); but, what happens when you want to start using soft-deletes where a table might have two records with the same SSN but have different "is_deleted" flags for soft delete? At that point, you can't have a unique column value.
Or, you could spread the uniqueness to SNN+is_deleted. But that doesn't make sense because you could have multiple delete rows with the same SSN+is_deleted value if they are all deleted.
But, to your point, I think we believe the same thing: use NULL where it makes sense. I agree completely. I think we just have different opinions about where NULLs make sense. I don't think they make sense in a lot of situations (not ALL, just a lot).
I simply cannot agree in any way that there is fundamentally something wrong with allowing NULL values in a database.
"blank answers are not stored... but are instead converted to no answer."
"No answer" is a very valid (and in some situations, important logical) distinction between a blank value.
As far as conversion, that's up to you and your logic. If there is a value in a field, and a user blanks out the field and saves the record - it's up to you whether to convert that to null or put in an empty string (which would allow you to tell that it used to hold a value and no longer does, vs null meaning there has never been a value placed in that field).
@Paul,
Exactly - value vs. NULL is different at the philosophical level. However, it is up to you as the developer and application architect to determine if that philosophical difference actually translates to any meaningful difference in the application.
@Ben:
You said "There are certainly times when they should and have to be used; a binary tree might be a great example of place where it should be used (although I'd probably argue that a parent node ID of zero would / should indicate no parent just as well as NULL would)."
If you do it this way then you have to add a row with ID=0 or not use an FK and ignore referential integrity issues. Then you have this magical zero that does what NULL is supposed to do but it isn't NULL. Sounds like pointless complexity to me, pointless complexity that will only confuse anyone that is already comfortable and experienced with SQL.
If you were doing this sort of thing in C or C++ you would use "NULL" (or a raw 0 depending on local conventions), in perl you'd use "undef", in JavaScript you'd use "null", in python you'd use "None", etc. so why invent something special for SQL?
"NULL = NULL" is false so you can have as many NULLs as you want without violating a unique constraint. If you need a "soft delete" then you can NULL the offending column, dump the deleted-but-not-really data in another table, use a CHECK constraint instead of a UNIQUE constraint, or invent another pathological example :)
The next obvious question is "How many angels can dance on the head of a pin?" And after that we can move on to "which text editor is the best?".
I personally believe that NULLs do have a purpose (which is not negated in this article), but should be used sparingly.
I say this for two reasons:
1. NULLs actually do take up space. In MS SQL, I believe they are stored in sets of 4 per byte per record; so 1 to 4 NULLs in a record equals 1 byte, 5 to 8 NULLs in a record equals 2 bytes, and so on.
2. As Ben points out previously, having NULLs everywhere leads to data with no integrity.
When it comes to data that is not supplied on a regular bases (eg: fax number, passport number), I personally choose to use the normalization method of placing these fields in there own table (eg: userPhone, userIdentification).
This approach makes your architecture easier to add to in the future. For example, say that you currently only hold one phone number for your users. What happens when you need to start storing home, work, fax and mobile numbers. Well, you could either tack on 3 extra fields to your user table or in your userPhone table with a composite key of userID and phoneType, you could add 3 more phone types.
With the soft-delete example, I am assuming that you are soft-deleting a record and not a column (like updating a CMS and keeping previous changes); can I recommend creating another table (perhaps in an archive database) that then has a date deleted field which is apart of the primary key/uniqueness constraint on the archive table. This keeps your original table clean and the transfer to the archive table can be automated with a bulk query-based insert via a delete trigger (the delete trigger can be in SQL or in your application).
@Eric,
I see where our miscommunication is coming from. In my world, zero is never a foreign constraint. Therefore, anything that has a foreign key of zero is not pointing to anything. Therefore, referential integrity is fine, just slightly different than yours.
Also, I don't think that soft-deletes are bad examples? Are they? I use them all the time in applications.
@Eric, @Andrew,
If storing "soft delete" rows in an archive table is a best practice, I am all for it. I have not done that personally, but I am here to learn most definitely. Is that what you guys normally do?
@Ben,
I use an archive database and delete triggers for soft deletes and have found that it has 3 prominent advantages, it is:
1. Still possible to keep referential integrity.
2. Easier for someone else to see what you are attempting to do.
3. Retrieving information can be achieved from 2 simple queries.
@Andrew,
I can see, especially, that having a delete table would decrease the size of the primary tables and make the SELECT queries less complicated. I don't know very much of anything about triggers (I usually put all that in the ColdFusion code); perhaps it's time to bone up on my database architecture.
@Ben,
Database triggers are like stored procedures that get called with some extra information on what has changed when an event occurs.
The only benefit that I can see with doing it in the database instead of in the code, is enforcing the data to be archived with every delete.
However, if you actually want to hard-delete; you have to either delete from both tables in separate transactions (implicit or explicit transactions) or temporarily disable the trigger before deleting.
I guess it comes down to where you want your business logic.
Does anyone know how to set a default value using Alter Table sql.
ALTER TABLE addresses
add city varchar(64) default noCity
is not working:
Throws: ][ODBC Microsoft Access Driver] Syntax error in ALTER TABLE statement.
CF 8, Access DB
Thanks
Your question is rather off-topic... however, you can't set the default value in the same ALTER statement as you create the column, as it's actually a constraint and not just a property of the column. Here is what you'd want to do after creating the column:
ALTER TABLE addresses ADD CONSTRAINT
DF_addresses_city DEFAULT 'noCity' FOR city
"DF_addresses_city" being the name for the new constraint, the rest should be self-explanatory.
Thanks Alex, i think this may still be on-topic, as lot of developers will want to avoid Null fields after reading this blog. Knowing how to specify default values in SQL is surely the key?
Your explanation on how to create columns with default values is great thanks, and will be very useful.
David
Dude, NULL is VALID. Not understanding how to code with NULL is not valid.
Consider a survey.
If the user fails to answer a question, do you store some arbitrary value to mean "unknown" or do you store NULL?
What if your survey_answers_received table has a foreign key field that points to the survey_answers_possible table to describe the answers per question? No arbitrary value is valid in this case, unless you put in an arbitrary "no answer provided" field in the parent table, but that's lame.
The point is, NULL means "Data was not provided". Any other arbitrary value set to mean that must be defined somewhere in the code base, which is itself ridiculous. My favorite is when people come up with multiple levels of "Data was not provided" ([-1, -2, -3], for example).
Intelligent coding means dealing intelligently with all possible inputs and outputs. NULL is a VALID input and a VALID output. These excuses are ridiculous: "I don't feel like dealing with it." "It's too inconvenient." "NULL doesn't feel like a safe thing to put in a record." "Cold Fusion doesn't treat NULL correctly."
In fact, that last excuse is an argument against Cold Fusion.
Sorry to rant,
Way at the top you provided a lame example with middle name. You argue that you should store the empty string that a user input so you can't count the number of users who input an empty string. Should you also store a string of 150 spaces? Is trimming all spaces to store an empty string not the same as counting an empty string null?
Would you like to replicate a database full of nulls or a database full of empty strings? Which do you think will replicate faster?
The best argument against NULL is that there's no reason to store what you don't know. But, that argument does not suggest putting in arbitrary "unknown" values. That argument is for intelligently designing your schema to minimize the unknowns in the data.
(In fact, in my survey example, I wouldn't store NULL for unanswered, I would store nothing for unanswered.)
I think there are plenty of reasons not to use NULL in a database.
For example, I don't use NULLS for the different name fields because if I want to create a query that concatenates first + middle + last, (like for a name drop down box) those names that have NULL as a middle name won't show up in the list.
Also using nulls often requires more overhead or extra SQL manipulation to work with. I either have to deal with them on the SQL side or do something like check for isNull and empty string in the code when I want to display information.
Also on forms where you want to insert data into the database, if I wanted to make my middle name null I would have to do something like this in .NET
Dim MiddleName As String = IIf(txtMiddle.Text.Trim = String.Empty, Nothing, txtMiddle.Text.Trim)
Insert(MiddleName)
When if I just allowed empty strings I could just do this:
Insert(txtMiddleName.text.trim)
The extra overhead on the GUI side is not acceptable to me.....people want fast applications so I cut out as much as possible.
Dates, Boolean, those types of fields make sense to me to use NULL however.
I think it all matters in what your application is supposed to do and what is most important to your users.
@Brandon,
I have nothing against NULL values when the serve a purpose other than simply philosophical. That's all that I am really saying. If there is a business reason for your app to store NULL values, then go for it, there's nothing wrong with that.
http://tkyte.blogspot.com/2009/10/back-from-oracle-openworld.html
Hi,
It was a good one.I have tip regarding to database to share with you.
Consuming No space for storing NULL value in table
when designing a table, we create columns but later some of these columns are not used or rarely used. These columns always take NULL value. Storing NULL value in a column takes memory space leading to wastage of memory. To avoid this there is a new feature present in SQL Server 2008 i.e. Sparse Column.
More on this :
Storing NULL in a sparse column doesn't take any memory space at all. This feature provides a highly efficient way of managing empty data in a database by enabling NULL data to consume no physical space. When you declare a column as Sparse and any time a NULL value is entered in the column it will not use any space.
EX:-
CREATE TABLE PersonInfo
(
PK_PRIMARY_KEY INT PRIMARY KEY,
FIRST_NAME VARCHAR(100) NULL,
LAST_NAME VARCHAR(100) NULL,
ADDRESS1 VARCHAR(100) NULL,
ADDRESS2 VARCHAR(100) NULL,
ADDRESS3 VARCHAR(100) NULL,
ADDRESS4 VARCHAR(100) SPARSE NULL,
CITY VARCHAR(50) NULL,
STATE VARCHAR(50) NULL,
RACE VARCHAR(100) SPARSE NULL,
RELIGION VARCHAR(100) SPARSE NULL
)
In this example FIRST_NAME thorugh ADDRESS3 and CITY, STATE are required, ADDRESS4, RACE, RELIGION ARE rarely used. When creating the table for ADDRESS4, RACE, RELIGION we are using SPARSE option, since we know this date will be rarely stored.
N.B:- Sparse column not applicable for following data type :
geometry, geography, image, ntext, text,timestamp, user-defined datatypes
thanks
Eliza
@Eliza,
Oh cool, I have not heard of that column type before. Unfortunately, I have switched over to MySQL for many projects and cannot take advantage of this.
I am an analysit and have come across this issue lately. I use a third party tool to create reports. I just recently learned that some of my reports are wrong because I have been filtering on null data. After looking at the database, I saw that a lot of the null data is stored as an empty string (and therefore, not returned in my results). While I can change my logic of my reports to accomodate this issue, we have many end users who create their own ad hoq queries with simple drag and drop technology. they can easily set a condition on a field to say is null or is not null. These are non technical users. Teaching them how to set a condition of length = 0 is not something that can easily be explained to them nor will they remember to do it all the time. I have been creating reports off of many databases. This one database is the only one that has this issue.
@Randi,
While I can appreciate your specific situation, I personally am not a fan of a situation where people have the option to run their own ad-hoc reports on the database. This is, honestly, a recipe for problems; complex joins, business-critical understanding of table relationships - there's a reason that companies have DBA and pre-compiled reports (or applications that are built just to help people run specialized reports) - this stuff is complicated.
I don't want to diminish your situation at all; but, in general, I wouldn't say that your database strategy should be tailored to accommodate laypersons.
Nulls are great!
They are excluded where you expect them to be.
As per your examples, if you wish to include them, you need to use the sql server isNull() function to replace the nulls with blank strings.
Theres also an opposite nullIf() function for replacing values with nulls when they match conditions - when used in the right place, can be very handy.
I've rewritten some of your examples:
blank string names are left as blank strings
null names are converted to blank strings
-- len_count
select count(*)
from test t
where len( isNull( t.name, '' ) ) = 0
-- 5
-- combo_count
select count(*)
from test t
where isNull( t.name, '' ) not like '_%'
-- 5
In this example, blank string names are converted to nulls using nullIf()
-- has null or blank name
select count(*)
from test t
where nullIf( t.name, '' ) is null
-- 5
-- has a name
select count(*)
from test t
where nullIf( t.name, '' ) is not null
-- 4
-- has a name and is not simply whitespace
select count(*)
from test t
where nullIf( lTrim( rTrim( t.name ) ), '' ) is not null
-- 4
@Mike,
If you enjoy using NULL values, then go for it; from a technical standpoint, there's certainly no reason to not use them if you like them.
All I'm saying is that I think a lot of times they don't add any business value.
@Ben Nadel,
Are you saying that correct data doesn't have any business value? Nonsense. What about referential integrity and foreign keys?
@Eric,
By all means if you need referential integrity and foreign keys for things like "middle name", then you should use NULL values.
In the applications I build, 99% of the time, I there is no meaningful difference between a NULL "middle name" and an empty string.... it simply adds no business value.
Of course, it all comes down to what you are trying to accomplish.
@Eric,
I think we should just agree that there are no cross-the-board rules on this. NULL values are good when they add value. That is highly contextual - there's nothing about NULL values that is implicitly useful.
@Eric, @Ben
I read all your posts and learned alot from this 4 years old useful discussion.
My question is specific to the point.
I am storing enrollment data for local and international students.
Both student types share a number of common fields like fname,lname,DOB etc, however, there are some foreign key field which are specific to local students.
Incase I keep them together, the foreign key fields will have repeatedly Null for international students. Is that a good practice?
If not, then what I thought was that if I should create another table with StudentID along with all other Foreign Key fields( this way only records for local students are stored in that table and we will have no nulls or possibily minimum number of nulls).
I would appreciate if you also email your reply.
@Ahmad,
That's a really good architectural question; and, one that I don't know if I have any great answers on. You could go the route where all of the local student data is factored out into its own table so you have a sort of "piggy back" table that joins to *some* of the rows in the student table. Or, you could put the foreign keys in the student table and have a bunch of them as NULL (or zero in my personal preference).
I am sure there are some disk-space issues to consider (though in this day and age, I am not sure if those are relevant anymore).
I've done both approaches over the years and they both worked out. I vaguely remember just getting tired of always having to perform a join to get the full "local" student data; but, you can always create a VIEW to pre-join a lot of that if you care to.
I'm sorry, I don't have any great advice on this matter; I think this goes beyond what I understand as best practices.
The more sizable one record can become, the more it probably makes sense to break it out into its own table???
Hey,
I would like to know Under what business rule should one introduce NULL? Besides oracle provides NVL and NVL2 functions which can use to convert NULL value ,would you advise me to use NULL since i can convert its value?
For me, the difference between NULL and an empty string is fully meaningful.
I often use NULL for values that the user couldn't have entered while creating the record. For example, internal values that should only be filled out by an operator or automatic process, or that the user can't fill out without first going through some steps. Of course, I could define a default value, but often it's free-form input and I need to tell between an user-entered value (which could be just about anything) and the special "this field hasn't been modified by the user" (which can't be a string, and hence it's NULL).
I also use NULL for data types that don't accept an empty string as a value, but that are optional and can be left blank. Since blank is invalid, I use NULL.
Numeric data types often allow NaN as a value, but NaN should only be used to store the result of illegal calculations, like dividing by 0.
On the other hand, if you have a lot of optional, unbounded fields that seldom have values, it might make sense to use an additional table of keys and values to store such a mess, and store only the common, required fields at the main table. Of course, that means that the optional fields can't be used at a data table report without some additional magic, but the report would get rather cluttered anyways, and you could always click on a record to get the additional information. Of course, this doesn't excuse you from the potential need of looking for a value in those fields...
@Locoluis,
If NULL values make your life easier, which it sounds like they do, then I would definitely recommend using them.
maybe change your conclusion to:
DO NOT ALLOW NULL VALUES unless you know what you are doing. You will only be making things harder for yourself.
I agree with those that are saying that nulls are acceptable and valuable and that they mean no value for this field. For instance, I have a table with an expiration date field. However, some records do not expire. A null value means just that - no expiration date for this record.
@Colin,
I'm definitely fine with that. They have their place.
@Chuck,
Word up - I just created a date/time field in my database this morning that had a NULL value in it. To me, there is no value in that field that could be meaningful other than NULL (if no date was provided).
Basically what we are saying here is that:
If need to store a true/false value you would naturally choose a bit and have it exist always in the 1 or 0 state, if you allow NULL then you have created a 3rd state and it will no longer match what you are trying to represent.
For a text field, the field is either blank "" or has text. You don't want to add a third option NULL because now there is another option beyond what the object you are trying to represent needs.
On top of that you now have to worry about handling NULL reference exceptions, yuck.
I agree that for strings, and many other types that have a natural concept of an "empty" or "default" value, null should be avoided. But consider the case of a column that stores a customer's gender. Naturally this should be represented with a boolean. With gender, there is no "empty" or "default" value - you simply can't default to male or female, because that is imparting meaning to the data that is not there.
This means that every time an employee doesn't ask the customer's gender (a common occurrence at my company since it's not our policy to ask), your statistics get more and more erroneously weighted toward the default, which of course destroys the utility of the data you're collecting.
It also means that if at a later date you decide you want to start collecting this data, there is no way to know which records already have the data, and on which the data should be collected.
I don't see the point in arguing for or against the use of nulls. They have their place. It's your choice as a developer whether you will use them or ignore them.
The best thing about nulls is that you can use them however you want. They are simply another value. A null (eg. string) in one application / database / table might have a very different meaning to in another. It's all down to the documentation & implementation.
You might want nulls to store a "value not set".
You might want them to indicate "never been set".
Maybe even for cache style columns, "needs recalculation".
There are tons of real world examples that come to mind where I would use them. Some data types they make sense, some they do not.
If you had a gender bool, as mentioned above, the use of nulls is required if you cant guarantee male or female. Otherwise you'd have to use a string or something that has an acceptable blank value. Heck you could use an integer and have 0=unknown, 1=male, 2=female 3=both, 4=neither?
It really comes down to your db design skillz.
What about dates? Do you want a 1900-01-01 as your default "blank" date? Or a null?
Integer columns that are foreign keys and using a zero instead of a null to represent "no data" would simply constraint fail. Unless you were crafty and had a zero pk record in each table, but that's just nuts...
In sql server there is a misconception that nulls take up no space. They actually take up 2 bytes, for variable width columns. Fixed with, looking at you Char, store the same as any other value.
I can agree that nulls can be annoying and hard to work with at times. One example of a case where I might find them useful is if I had a table in the database that is updated by both at times by a process within the database itself AND at other times by a web form. I might want to, every time I encounter the field on the form and someone is filling it out go ahead and enter the empty string whenever someone doesn't fill out that particular field, and when fields are added by database processes, have nulls fill in those fields.
A more precise description of this case is that you have a table, and the data is generally filled in by selects from other tables also within the database (for example, data manipulations, joins, and operations performed on that data). Let's say you have one field...We'll use middle name. And that field has some nulls in the data when filled in from the database. Then, you have an administrative area on a website, which can either insert a new record into this table, or it can pull a record from the database and edit it.
One way I think this would be useful is if, say, you wanted to pull all records where no one has yet "entered" a middle name at all...whether an actual value with a len has been entered or an empty string with len = 0 has been entered. Cases where maybe you want to differentiate between null and an empty string.
Anyway. That probably didn't even make sense. But I was just thinking of a place where I might encounter them, and they might need to be allowed. But overall, I do find them incredibly frustrating and difficult to work with at times within code itself.
I always check values coming into my code for rational values, whether from a database or user input. Nulls are just a part of that.
A good function library can save a lot of pain.
I was checking both, I was just using the nulls to differentiate between the two. I guess you could use some other default value to do the same, but I was using nulls versus empty strings. When using default values (which I do use sometimes), if you are trying to differentiate between someone entering data in a form and entering an "empty string" in the field vs. pulling null values from a field in the database (or getting records through a join that results in null values), you have to really consider a default value that wouldn't otherwise ever be used, because otherwise, you wouldn't be able to differentiate between when the default value was entered automatically or someone entered it using one of the data entry methods, if you are checking for lack of data entry and need to update that particular field.
I know this was posted back in 2006, so I wonder if this guy Ben even has a job still. I think Ben is a Bozo. I couldn't get passed the first couple comments below the main article because the content was so flawed. In Ben's examples of tirelessly trying to defend his stance, he mentions cases where people ask for reports where the length of a column is 'X'. I have never in my career ever been asked by a business person (or any person) to pull data based on the length (i.e. give me all people with a SSN length of 0). Even non-technically savvy people know to ask for a report for persons that have no SSN.
I am on the side of allowing nullable values since it is very business relevent for someone not to provide a piece of data that is not required. I also feel it is completely inappropriate to save an empty string value in some cases (why should i allow empty strings to represent no value in varchar fields but be forced to have nulls in date fields). The only option that makes sense from a logical and consistency standpoint is to have nulls where values have not been supplied.
Now i have heard decent arguments about why nulls shouldn't be allowed, but Ben missed every legitimate argument in his comments.
@HiredMind,
I see no reason why we couldn't store an empty value for gender. The possible values would be "M", "F", or "" which would be unknown. Blank can mean that the data is not available, we do not need NULLs for this.
@Ryan,
I'm sorry, this argument is just silly.
You don't "need" NULL. You also don't "need" to the character datatype. You don't "need" to store gender using boolean, or integer, or pictures. Of course, you don't "need" to be a software developer if you can't be bothered using the tools of the trade.
One thing you definitely don't "need" to do is make your database an exact representation of a Web form. Your database is for storing facts, not for representing some Web form. If done right, the facts it stores will serve multiple purposes beyond one extremely limited form of input. You ~could~ use an empty string as a fact of some form field not being entered. You ~could~ use NULL. Which you pick probably depends on whether you are a disciplined software engineer, or a hack.
As I mentioned before, why not try representing your data in such a way that NULL isn't even an issue? It doesn't always work, but it's a good mindset.
I had to throw in my use case here for why NULL values are valuable in a database.
Someone stated this before and it bears repeating:
The intention of a NULL value was to indicate that the value, according to the DB schema, is not a known empty string or known zero value (or whatever that column type is), but that it is explicitly NOTHING.
I use this for a few reasons, all of which must be supported by proper business logic stored procedures or (if you prefer to code it into the app directly) properly coded business logic stored in the application's data abstraction layer.
Let's say I want to have a table that has a unique value or code that identifies an individual in a survey. I don't want to use email address because I don't know that yet... I'm passing out the survey with a unique code for each user to enter when they fill out the survey so I can ensure only one entry per unique code.
Now there are two ways I can set up this data structure: normalize, or convenient. In the case of one survey, I might opt for normalize to keep the data structure clean. In that case, I'd set up two tables, one with just the valid list of unique codes and another that would accept the answers from the survey entrants. My business logic would require me to check if that unique code exists in the entries table before letting the user enter the survey data, and if not present in that table, let the user enter their data and add their data to the table.
For one survey, this makes sense, and is the best option. Now let's take it to a different, real world scenario: 10,000 surveys, each needing different data, all of which need instant reporting of the data with no lag. I could normalize the heck out of the data, building relationships between unique codes, surveys, a set of common values for all surveys, individual tables for values shared only by a subset of surveys, etc., etc., but the speed of reporting on millions of rows in such a wide variety of tables would be sluggish. In a perfect world, my insert logic could insert the data into all of this normalized structure as well as consolidate it all into one table for each survey, but that would assume management allows for that amount of time for the project. In a real world scenario, the job must be done quickly... so what to do?
The convenient solution:
Each survey table has all of the columns, including a unique code column. There is only one table for each survey, and that makes the select statements fast (no multi-table joins to worry about). All fields except the unique code column are set as null (because hey, look at that - they are unknown values and not yet set... exactly what NULL is best used to mean!).
When a user comes to fill out a survey, I hit the survey's one table (indexed on the unique code), check if the Submitted_Date value is null for that unique value. If it is null, I let them enter their data, update that survey's table with the data, and I'm done. All data is contained in one table and I can easily select from the table only those records which don't have a null value for the Submitted_Date, I can do reporting via counts on various columns and null values are ignored in the aggregates, and all is really really easy to manage over the massive number of surveys.
So, to complete my extremely long thought, NULLs can be used if you know how and when to use them, use proper business logic wrappers, and document how and why they were used in the code.
Don't knock NULLs just because they cause headaches when used incorrectly, knock bad developers for causing the headaches!
Hi,
Here an example :
tblData (ID, Name, Class)
1 | 'Ben' | Null
2 | 'Tony' | Null
3 | 'Jessica Alba' | 'sexy'
4 | 'Shakira' | 'sexy'
5 | 'Gollum' | 'ugly'
On SQL Server, when you do
SELECT ID, Name
FROM tblData
WHERE Class <> 'sexy'
You could think that the query will return all records where the class is not 'sexy', like Ben, Tony and Gollum, but no! It returns only Gollum, because the class of Ben and Tony is unknown (null). Sound just a little bit weird to me : null is not equal to 'sexy', so...
But in the other way, I can understand why it's like it.
I think it's fine to use null when necessary, but you need to know exactly what you are doing.
Well u can use NVL i suppose
SELECT *
FROM (SELECT id,NVL(name,' ')nm
FROM test) t
WHERE t.nm NOT LIKE ' %';
I read some of the comments, to the point where Ben was suggesting using a special value of 0 instead of NULL, for missing ids. That just seemed so silly that I quit reading. That's inventing a local special value, which won't work with any existing RDBMS foreign key system, instead of using the conventional and traditional solution of using NULL -- it's using a private kludge instead. It sounds like a hack by someone who is not experienced with keys and constraints.
Then I remembered that it is humorous, but usually a waste of time, to read arguments made by people who only have worked with toy databases.
I know this article is old but don't get what your saying and would appreciate a reply.
I have a users table that stores username, email, password and a few other things and another table users_profiles where user profile data is stored, they use InnoDB and are related by the user_uid in both tables.
Now in the user profile table all fields can be NULL, apart from the user_uid of course, why ? because a user is not required and does not have to fill in any user profile details like first/last name etc.
If i set the columns to NOT NULL that would mean that field has to contain data in an UPDATE query as an example.
Either i am not understanding or something is wrong somewhere. Am i thinking correctly in what i am saying ?
@Mathew,
I think the consensus of this post has been that nulls are fine if they serve a purpose.
In your case they do serve a valid purpose,that a value has not been entered.
Nulls can clutter your code and make it less readable. Nulls can also refine your code and make it more readable.
It's the developers job to make the decisions.
I think that what is being argued is silly to an extent.
A NULL is treated differently by the database per the original post. And to argue that it should be used instead of something else is not going to surmount this single point.
Middle initial is not a good rebuttal. If none was provided, I may load it as "None provided" instead of NULL. ANd it serves the same purpose as NULL AND it overcomes the point that the original post was trying to make.
Hey, we didn't design the database to make NULL the way it is, we can only choose whether or not we want to use it.
The only rebuttal I read that makes sense to me is disk space. A NULL is less space than "No value provided" or even a shorter string like a blank space if you are dealing with a high volume database.
On that topic of an empty string, I don't like that implementation because as stated earlier, there is a distinction between it being sent that way vs. it was truly sent as NULL and if you can't tell the difference, you just lost grasp of some of the quality in your data.
As for the only appropriate rebuttal (disk space). Disk space is cheap. Confused customers or a query in your data warehouse that returns the wrong results thus resulting in an inappropriate business decision is far more costly.
What I think is crazy, and borderline arrogant is that developers think the database belongs to them? Excuse me?!? Who paid for the database? Moi (or insert your end user here).
WHo is using the database? Moi (Or insert your end user here)
You are just a crummy geek who develops code and thinks you are the next DaVinci of your era because you can cram it all into one line or use NVL function (yes, I'm quite aware of the damn NVL function).
But in reality, it is the user, the end user who uses the database. We type our own queries and run them...unless, of course, you are the asshole developer who thinks we are too dumb and we must used your spoonfed queries or reports that you make at your leisure according to some timetable that certainly doesn't suit my own.
Some of you commented that you use "real databases instead of toy databases". Bologne.
End users write their own queries and they forget to use NVL and they get "Gollum" according to the example provided above and they think that is the full dataset. THey then go off and make business decisions which affect the entire company (negatively due to the fact that this query resulted in missed data)....why? All because some arrogant developer insisted on using NULL values thus causing more work for the end user.
Get off your high horse, developers....start to think about the END USER!!! Start to think about the people who use your crap that you produce. Start to think about "MAINTENANCE" and "SUPPORT".
Having no NULLS reduces maintenance and support in just about every area where the field is used in the "where" clause. If it is a middle initial as the other end user pointed out...who cares? Nobody queries on middle initial in the where clause so by all means, leave it as NULL. But, for the love of mercy, don't put valuable fields which are commonly used in where clauses with NULL values.
One final comment for you "smart" developers who think that this is gibberish...let me intellectually pwn you with this final statement. WHat the blazes does the NVL function do? C'mon, developer...answer it without going to Oracle to look it up. It replaces the NULL with a STRING!!! OMFG...YOU ARE FORCING EVERY SINGLE END USER TO USE A FUNCTION TO FORCE THE NULL INTO A STRING JUST BECAUSE YOU ARE TOO ARROGANT TO STORE IT AS A STRING IN THE FIRST PLACE.
Look in the mirror...yes, you. You make end users spit on you.
According to most people here it seems the database schema should be changed for every end user depending on how they want not entered values to be seen. So for my 1200 end users I'll keep a database of who wants 0's for nulls, who wants empty strings for nulls, who wants 'None Provided' for nulls or all the other silly examples have been. Or they can simply write a ISNULL(field, whatever they want it to be). But I guess I'm being arrogant and should spend 8 hours a day doing adjustments to every table per user so they don't have to use a built in function. Lazy developers ftw...
Man, this thread is the energizer bunny of threads, it just goes on and on.
@Colin, energizer bunny is right. Now if only I could find a way to unsubscribe.
I would just add that database design decisions should generally not be made based on features of the language you are currently using. The fact that cold fusion converts null to empty string is irrelevant imo. I don't think making queries easier to write is necessarily relevant either. The design of the database should be based on what is the best way to represent your data. I use nulls because it is consistant with other data types and I don't think there is much value in storing empty string.
Very useful information.
IF there is null value then you can use ifnull(value,replacevalue)is very useful.Avoid null values .
Hey fellow geeks, please let me share my 2 cent's worth about defaulting DB fields to Null!
As a Programmer/Analyst/Developer, over the past 16 yrs, I've created many DB's, and have ALWAYS coded programs against the DB's I've created (I'm at a place with ~3300 people). Just want to share what has consistanty worked for me: When designing a database table, I set ALL fields to accept NULLS, except for the very few that MUST have a value for the logic to work (PK's, join and/or [sometimes] search fields, etc.). To really speed things along, you of course will want to INDEXT those fields!
Just to note: I've NEVER set an otherwise NULL field to some dummy/default value, and for the life in me, can't imagine why anyone would do so. When programming (e.g. SQL or VB), you'd then have garbage to work with.
I've found working with NULLS easy and painless!:)
My 2 pennies worth, hope it helps someone.
David
@David, I agree 99%.
I do sometimes use default value spaces, and integer zeroes when I know the values have a specific code equivalent ... like Enumerations and flags. Sometimes necessary in a properly de-normalised database.
Database first design has always worked for me too and nulls are not scary.
To end users who think the database should be designed for them ... no. It's designed for the accurate, reliable and efficient storage of YOUR data. Learn to use the tools you have been given or suggest better tools. Worse than screwing up a report would be storing data in a way that made a particular report impossible.
Colin
Reading this thread further I would have to say that Ben is way off the mark on his comments about using 0 instead of null for foreign key fields. There isn't too much in computers that you can say is absolutely wrong but this is about as close as it gets. Ben you really need to study up on database design if you still think this is a good idea. It's pretty much something that you should simply never do.
First of all this use can be resolved taking case to case . there is nothing universal .
if you use Null at DB End You have handle that at frontend . If you Convert it it Zero at DB End then you need some other parameter to understand whether Zero ( or Space for String) was the ACTUAL value or it is Null converted to Zero .
Why u need to know that ? Because Data is NOT only Data to save and show your single window customer . later on u need to do statistical works on it. it may be used for the purpose of making Market research and there Lots of Charts will come .
While drawing a Chart if Null comes as Null and Zero as Zero you can differentiate between Broken area and the area where the line touches any axis .
if in both cases DB sends "Zero" then it's just useless ! No DataPoint Area is not equal to 0,0 or X,0 or 0,Y .
got it ? So keeping the bigger picture in mind to the extra work of null handling from application layer . At least I do it and suggest my team to do so .
First of all this issue can be resolved taking as case to case . there is nothing universal .
if you use Null at DB End You have to handle that at application. If you Convert it it Zero at DB End then you need some other parameter to understand whether Zero ( or Space for String) was the ACTUAL value or it is Null converted to Zero .
Why u need to know that ? Because Data is NOT only Data to save and show to your single window customer . later on u need to do statistical works on it. it may be used for the purpose of making Market research and there Lots of Charts will come .
While drawing a Chart if Null comes as Null and Zero as Zero you can differentiate between Broken area and the area where the line touches any axis because of being the Actual Zero Value .
if in both cases DB sends "Zero" then it's just useless ! "No DataPoint Area" is not equal to 0,0 or X,0 or 0,Y .
got it ? So keeping the bigger picture in mind do the extra work of Null handling from application layer and keep it as null at DB. At least I do it and suggest my team to do so .
[@Moderator :please delete my previous post . had to repost it 'cause of Typographic Error in the previous ]
As both a database admin and an application developer, this is an argument I see often. Database folks are extremely data definition orientated, which is a requirement of being a DBA and storing data. Application developers are much more data value orientated, which is a requirement as they are presenting users with data.
Null values mean 'unknown', and an empty string means 'None' to a DBA. To an application developer, Null and an empty string both mean 'None'.
DBAs expect a concatenated field that contains a null to return null, while an application developer wants the field to contain whatever data is known, and to ignore nulls in the result. For example, in a 3 part name (first name, middle name, last name) of this value: Ben NULL Nadel
DBA's expect 3 columns of 'Ben', NULL, and 'Nadel' and expect the application handle the unknown of middle name. But Application Developers want to display Ben Nadel, because they know that much of the name, but if all 3 are concatenated in the application, the value will display null.
Each side is correct based on the primary goal of their role, which is to store data correctly, or display data correctly. But each side needs to understand the other role, so that data can be stored and retrieved correctly.
For example, it is perfectly valid to store a NULL for a middle name if it isn't known. However, if the middle name is known to not exist ( I know people without a middle name) then it is time to store an empty string, or perhaps have a bit flag that says 'no middle name' or a specific value of middle name that means 'no middle name' to the application. Another solution is to not concatenate 3 fields when one can be null, but them as 3 separate fields in the application or do parsing on the application side to break the name into parts before storing it in the database side.
So in a nutshell, the title of this blog post is correct, but the example given is very incorrect. Only allow NULL values when appropriate is spot on. That means that first name and last name should probably not allow nulls at the DB side. However, middle name should allow nulls, since most processing on the application side doesn't need a middle name. (when did you last see a letter addressed to you with your middle name?) But if the user provides a middle name though the application, or states they don't have a middle name, the database should store it, and the application should handle it.
When you start using Foreign Keys (for example) nulls are a must-have.
The point of saying don't use null makes me feel the author has no understanding of it's value. But that was long time ago so ... i guess he knows better now.
Also depending on strategy of inheritance null are required.
There are lots of uses for null values where no other value is correct.
Just thought I should add my 2 cents ...
I am currently mining data from a large data set which is inserted into a database. There is a lot of information which I can mine from one element which I cannot from the other, hence some of my records will contain a lot of NULL columns. It is more sensible than a blank string, as it indicates that I don't know what this property is.
You are so spot on!
NULLs have their place but NOT in tables--use defaults and/or force real values. Also, in most cases If something could possibly be NULL it should really be in another table.
This is NOT an opinion....
@Scott I'm feeling nostalgic of my Database Design college course...
Highly-normalized tables sound great on paper until you actually have to work with them in the real world in large-scale business applications. Besides, if "Another End User" can't be bothered to understand or wrap null-capable columns in NVL (VALUE, COALESCE, etc) functions, I doubt that requiring extra joins will make his day.
I'm not a fan of forced default values as an alternative to null, because there may (and I deal with many cases where this is true) be a need to differentiate between a blank answer, no answer (not asked, etc), and anything the user can type themselves (which could include "no answer" or any other common defaults).
By pushing the "lack of value" into the left outer join with no match, you haven't relieved anyone of work. The DBA now has extra database objects to maintain. The web developer either needs to complicate his/her query or make multiple SQL calls/updates. The query writer still has to deal with the null value from the join.
Even if you provide a database view that makes these joins, at best you have neither helped nor hindered the developers or users.
Don't get me wrong, I have nothing against normalized tables; I use them in various forms whenever practical. I'm just saying that the intersection of "what is taught in school" and "what the real world does" is not as large as you might think.
@Scott DBA,
Well, you should feel nostalgic if you were taught correctly.
Broad sweeping normalization statement: yes; fueled by theoretical ideals: yes; absolutely always practical in every case: no.
Nulls are simply a niggle of mine, not to say allowing them is so bad it's only that they are not truly needed (Can you need nothing???) and when nulls are not possible in the database tables your queries, application code and schema can be simplified. Thus, improving performance (every tick counts) and making the code base more manageable.
RE: I'm not a fan of forced default values as an alternative to null, because there may (and I deal with many cases where this is true) be a need to differentiate between a blank answer, no answer (not asked, etc), and anything the user can type themselves (which could include "no answer" or any other common defaults).
-- So, why are you saving something that does not exist….
RE: By pushing the "lack of value" into the left outer join with no match, you haven't relieved anyone of work. The DBA now has extra database objects to maintain. The web developer either needs to complicate his/her query or make multiple SQL calls/updates. The query writer still has to deal with the null value from the join.
-- Left right, reft light… If stored in a normalized manner, an inner join will expose all information that DOES exist, no need to consider NULLs in queries (read: is less complex). Aside, you have to maintain your objects? Yuck!
RE: Even if you provide a database view that makes these joins, at best you have neither helped nor hindered the developers or users.
-- Somewhat true when scale (data size, connection concurrency or even development team size) is not considered, your developers work in silos and/or you do not care about development time down the road.
RE: Don't get me wrong, I have nothing against normalized tables; I use them in various forms whenever practical. I'm just saying that the intersection of "what is taught in school" and "what the real world does" is not as large as you might think.
-- I've been in computing for close to 30 years and I convert data between software systems and develop schemas. I see all sorts of data storage patterns daily, from various flavors of old green screen systems to desktop and web apps. I'm not just talking theory and I do concede that maintaing such an ideal (strict normalization) can seem and might be impractical for certain storage needs. Normalization and nulls are related but different subjects.
Still, I have yet to see a null in (or is it not in…) a database field actually add to any system, anytime, anywhere.
I mean how could it? (that that does not exist)
Enjoy nothing! ;-)
@Ben,
Why make a blog when you don't know database design and architecture? Clearly you have not worked with anything other than CF. If you have a mixed bag in a column with bad database design you can still do the following in SQL.
SELECT * FROM contact where ISNULL(middle_Name,'') = ''
Like others have said you need application validation and consistency in how you store your data. Also is .net you use DBNULL to help you with storing and gathering null data in code.
@Andrew,
Why use the non-standard ISNULL when standard SQL has COALESCE? Especially when you're taking a shot at someone for not understanding SQL.
you, database designer, are making some people miserable in this world. there are real people that do not have last name, and they always have trouble filling up online form. especially today when everything done in an online way.
sighhh... :(
regards
@hendro,
good call. I apologize for all the pain I have put you through ;-)
There is an argument to make as many fields nullable as practical.
You never know when the rules will change and it IS legitimate for a contact to only have one name.
Storing a blank for surname is not the same a null.
I mean, try telling Superman that he should put in a surname to your database.
Don't be afraid of nulls.
I have a question for. anyone whom might be willing to give me an answer. I tried to search a song on my phone and it told me the keyword was in my null. I had this problem a lot with my other phone but one of our friends was a hacker so I assumed he was helping my boyfriend hide shit from me. but im clueless when it comes to these things. if anyone that has an answer for me and is willing my email address is ihateliarstoo@gmail.com. My name is April. I appreciate any help thank you
@Colin,
Thanks Colin,
Below is one of real miserable experiences:
I was buying an air ticket going overseas in sales counter, and she has to fill in my name in the order form, where surname is a requirement. so after some small conversation, she put it on the computer, my nickname as first name, and my real name (the correct one written in passport, ID card and birth certificate) as surname.
I have no problem when depart from my country. When the plane landed and I have to go through immigration to enter other country, I was questioned by an immigration officer.
And let me tell you that immigration officers everywhere is not the friendliest people in the earth. This I understand, as I was some foreigner who about to enter someone else's country. she/he questioned you and srutinized the difference in names found between passport and the one in the ticket.
after a long half-hour being questioned over and over, she/he finally released me and let me go, after giving a lot of suggestion not to do it again, which there is really nothing i cannot do about that.
Those are nightmares, man. And it did not end there, you will go through the same procedure again and again, EVEN when you are going out of that country!!!
regards
Wow, so many passionate answers by so many people. My last comment I intend to make on this post is this:
End users should not be directly accessing the SQL database unless they are trained in how to build reports, know the schema of the database, and understand the impact of nulls and other data behaviors (for example, an empty date value, in some SQL systems, is stored as 1753-01-01, others store 1900-01-01, and others store null).
Answer is simple:
Use NULL where it makes sense from a practicality point of view, and be sure you have proper views, stored procedures, and functions in place for end users to use for building out their reports. End users without deep SQL knowledge should not be querying data tables directly... they could put the system into deadlock, perform a query on a table with billions of rows against a non-indexed field (and bring things to a halt), etc.
There is a reason companies hire DBAs, SQL report writers, Developers, etc... they each have special skills that allow them to do that work effectively and without causing errors that can create high costs for the company.
It doesn't make sense for a Sales Manager to create a SQL query pointed directly to the SQL tables to get a report created... they should pass this to the SQL report writer who is trained to get them the correct data. In companies where there is no SQL report writer, and the manager is not trained in SQL to the extent that they can be guaranteed to write safe and accurate reports, the best option is to provide them with a limited subset of data via pre-built views so that their queries are limiting what data is pulled from these known good data sets. If they are trained to a limited extent, giving them the ability to join existing known-good data sets to build new reports is the safest option.
And to anyone who replies "but you can eliminate the need for NULL by normalizing your database" hasn't spent much time working in the real world with RDBMS. Normalizing past the 3rd normal form (and even strictly adhering to the 3NF) can often lead to expensive massive table joins, increases data storage, and makes the schema a nightmare once it grows past the simple schema most people work with.
Options to improve reporting performance do include data cubing and other denormalization processes, so you can have a perfectly normalized DB and still have reasonable report speeds, but at that point, you really need a solid business use case to support the enormous cost of building and maintaining a DB that is strictly in 4th/5th/6th normal form.
And that's my 2¢... take it or leave it folks :-)
As a Programmer/Analyst of 18 yrs, I FULLY SUPPORT and NEED my fields defaulting to NULL in my DB's, and I'm rather surprised such could even be questioned. After all, NULLS are a native/basic part of SQL queries used to provide end users with pertinent data. Perhaps important to note is that I soley drive all the code, and end users/recipients are read only. I could go on, but wife says it's time for bed! : )
@Hendro,
man, that is a tale of woe. Customs people have no sense of humour or imagination.
The issue with the airport check-in system was more one of rigid validation and an inflexible interface design.
I must admit I tend to design my systems with a compulsory firstname, lastname requirement and an optional middle name ... it works for most (all?) of the cultures in the world's naming conventions.
Can I ask why you only have one valid name?
Obviously that criteria is good enough for your passport.
@Jonathon,
don't pike out now man, the discussion is just getting started :-)
@Colin,
Why only have one name? ask my parent... and mother only have one name only also :) . The thing is, it is not something we chosen, parent chosen it for us when we born, and it is written formally in birth certificate. and that name will carry throughout your whole life, in ID card, passport, when registering for school, etc.. etc.. if it is different, even one syllable, then you are different person.
and in my country (indonesia) when we married our name is still the same, we do not inherit family name by marriage. in some culture we do inherit family name, but that also only if the parents gave it explicitly.
@Hendro,
I was not aware of that.
If there is just one exception to the rule, the accepted convention of Firstname and Lastname requirements needs to change.
And if 140 Million + Indonesians commonly only have one name then it becomes an issue for all interface designers to think about.
@Colin,
This might be of interest to you:
http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
@Colin,
Unfortunately, we are outliers, only few of us exist even in my country, but we are real person! :)
Thanks for our conversation.
regards
@Ben,
"Sure thing boss, I will just grab all records whose SSN field has a ZERO length."
I'd see it as a bit of a red flag if anyone preferred ...WHERE LEN(SSN)=0 over either ...WHERE SSN = '' or ...WHERE SSN IS NULL regardless of the potential confusion over which magic no-data value you use. Why on earth evaluate potentially millions of string lengths and remove the opportunity to use an index when you don't need to?
Hi,
Of course you could argue the same about blanks... Don't allow them! Convert them to NULLS...
Then you can easily write the following SQL...
SELECT COUNT( * )
FROM test t
WHERE t.name IS NULL
And not worry about blanks...
Wow, I must admit that as a developer I am surprised that there are still discussions about NULL or blanks!
I have no doubt that when a field is not used (like MiddleName) it should be NULL.
I also know that this gives me problems now and then when I take over poorly designed database and/or applications.
Right now I have the following problem:
TableA has a column TableAId
TableB refers (sometimes) to TableA.TableAId - but only if the content in TableB has something to do with TableA.
Now - the previus developer made the decission that NULL are not good and therefore he inserted a 0 when there are no connections. Guess what - now I cannot create a Foreign Key between the tables unless I create a fake TableA row with the value 0 in TableAId.
Now I have to make some major changes to the application (using NHibernate which automatically inserts 0 in NULL fields if the property is not NULLable) and then change my database as well.
I call this bad design from the beginning! Of course there should be a foreign key between the tables and of course we should use the value NULL!