Why I Avoid DEFAULT Column Values In My SQL Database
When defining a relational database table schema, each column can have a type and, optionally, a default value to use if no value is provided in a subsequent row INSERT
statement. When evolving a table schema, these DEFAULT
value mechanics make migrations tractable. But, as soon as I can, I drop the DEFAULT
setting from the column definition. In fact, I try to limit my use of DEFAULT
values as much as possible.
As discussed in an earlier post on using an existing column to define a DEFAULT
value, when I migrate my database schema, I use the following workflow:
Add a new database column with a sane
DEFAULT
value.Update my ColdFusion code to make sure that an explicit value is always provided with each
INSERT
statement.Drop the
DEFAULT
value from the recently-created database column.
If I perform steps 1 and 2 and skip step 3, my application will continue to work. But, I make sure to always perform step 3 because I want my ColdFusion application to be the source or truth. This is the same reason that I don't use stored procedures. The application code should reveal everything that I need to know about how the data is used. And, more importantly, what shape the data needs to be.
When the application code doesn't have to be explicit about the data being persisted, it leaves the code open to interpretation: was the given logic omitted on purpose? Or, was the given logic omitted in error? This lack of clarity is a powerful "code smell". It is an indication that something is wrong and needs to be fixed.
Conversely, from the database administration perspective, everything in the database schema should exist for a reason. If a future developer or DBA (database administrator) looks at the schema and sees a DEFAULT
value, they have to assume that the DEFAULT
setting is there because the application requires it. And that dropping said DEFAULT
value would break the application.
If having a default value is helpful to the application, the default value should still be defined explicitly in the application code. Meaning, the INSERT
statement must include every column value; and, any optional column values should be defined using fallbacks within the application logic and / or data-access function signatures. This way, the application code remains the source of truth while still providing some level of flexibility.
Aside: In MySQL, dropping a
DEFAULT
value can be performed as an online DDL operation. Which means that it doesn't lock the table or degrade the user experience.
At the end of the day, this is a matter of "clean code". Which is, itself, a matter of great subjectivity. For me, having DEFAULT
values in the database feels messy—like a half-formed thought in need of completion. The more I can remove DEFAULT
values, the more easily I can understand the true intent of the application.
Reader Comments
I agree. However, if there are multiple, different applications that write to the database, that "source of truth" would shift.
At my current gig, 100% of database read, write and search operations are done through Stored Procedures (MSSQL). This primarily helps with optimization, but also ensures concerns like default values are covered. (For our data warehouse, it also makes sure that clients don't create bad search queries that could run poorly.)
You make a strong point, and I am a fan of having a single source of truth...which is one of the reasons I find React so appealing.
I've mostly used DEFAULT for createdAt columns, which makes the most sense to me. But yeah... maybe I shouldn't any longer 🤔
@Chris G, Hmmm. That might be the very ONE thing I would consider keeping as a DEFAULT. Unless one wants to fudge the data for some reason. If you also have an identity column, though, that's going to be a lot harder. Heh.
@Will,
When you have multiple applications writing to the same database, that's a whole other box of worms 🤣 In that kind of scenario, I say you have carte blanche to do whatever keeps the data the most stable. I know from experience that these "integration databases" can get pretty hairy. To this day, some of the slowest queries on my production database at work are coming from another service that I don't have control over 😱
@Chris,
I think that's fair - these are all just my "best efforts", not hardened rules. And, I'll blur the lines where it suits me. For example, in most of my code, I'll generate the data/time values in ColdFusion and then pass them to the SQL as a parameter. But, sometimes, I'll just throw a
UTC_TIMESTAMP()
in the SQL itself and call it a day.@Will,
You mention "identity column" - that's definitely a case where I still just lean on the
AUTO_INCREMENT
feature of the database. I know that "best practices" these days say that the primary key should be some sort of UUID that is generated by the client. But, I just haven't wrapped my head around that yet.@Ben Nadel,
Oh, definitely IDENTITY is solid, we don't want to be calculating that in Coldfusion! Ha!
On the "UUID vs. IDENTITY (int)" argument, there was a time that I was interested in moving that direction. Simply because moving data between databases, creating population scripts, etc. would be far easier. Much like a [key] value, say a Status table:
1, 'Open', 'Open Records'
2, 'Close', 'Closed Accounts'
...etc.
The 'Open' is a key value, and never changes. But using the int field (StatusID), well... you don't have to make sure it stays the same now, because your code will always use the [StatusKey] value. Always... But having "type data" vs "user data" having disparate ID's across different levels (dev, staging, QA, prod)... it would make my brain hurt. The UUID use would definitely remove that "issue".
I asked our company back, ages ago, about using UUID instead of the int ID's. They simply told me that for the volume of data they had (in the petabyte-range), the UUID's would take up too much room and their experiments showed they were too slow. Some quick research shows that an INT field (though SQL Server doesn't support unsigned, which is a waste IMO), is a 4-byte field. The UUID is a 16-byte storage. If you have a LOT of records, that's huge -- 4 times the storage requirement for the column!
But if your database is smaller, and you want convenience of not having to mess with "INSERT IDENTITY ON/OFF" as you move between databases, the UUID's would be great.
@Will,
Everything you just said hits home with me. Even though my company has been in a situation in which having UUID would have made life a lot easier (migrating records from one system to another), I still don't think I would have used UUIDs if I could go back and do it all again. UUIDs solve a very specific set of problems that I just feel like I don't have.
Hey, Ben,
Add a new database column with a sane DEFAULT value.
Are you sure you mean to use the word, 'sane?'
@Gary,
I think so. By sane, I meant a default value that is relevant to the app. Preferably using whatever default value the application logic will be inserting once the code has been updated.
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →