Considering Index Design And Database Uniqueness Constraints With Soft-Deletes In MySQL 5.6.37
As I mentioned in an earlier post about index design with a nullable DateTime column, I'm building a small scheduling app at work using MySql 5.6.37. The requirements for this scheduling app have been "unfolding" (aka, scope creep) as the project has continued. And, one of the newly-revealed features is that we need to keep track of "canceled" and "failed" events for each company in the scheduler. These alternate states - canceled and failed - aren't exactly soft-deletes; but, they are close enough in nature to merit the same terminology. The problem was, I had a uniqueness constraint on the companyID
column of the primary table; so, any attempt to track an alternate state for said company would violate that constraint. To get around this, I ended up adding a nullable isCurrent
column to the primary table and to the uniqueness constraint index.
PRIOR ART: Nothing that I talk about this post is revolutionary. In fact, my approach was very much influenced by other posts that I read while doing research in preparation for this task:
- Using Unique Database Fields with Soft Deletes by Bernhard Breytenbach
- Soft delete and unique constraint by Deepak Narayana Rao
- MySQL with Soft-Deletion, Unique Key and Foreign Key Constraints
That said, I still wanted to document this solution for my future self and to help codify my thoughts on the matter.
The primary table in this case is tracking "migrations" from one system to another. The first version that I built would delete records in the event that a migration was either canceled or failed. As such, the first implementation of the table was quite simple and looked something like this (slightly truncated):
CREATE TABLE `demo_migration` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`companyID` int(10) unsigned NOT NULL,
`createdAt` datetime(6) NOT NULL,
`scheduledAt` datetime(6) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `IX_byCompany` (`companyID`),
KEY `IX_byDate` (`scheduledAt`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Notice that there is a unique index on the companyID
. This would prevent a system administrator from accidentally scheduling the same company more than once, which would lead to all sorts of communication madness. In order for a company to be scheduled again, the system administrator would have to go in and explicitly delete the prior record before they could continue with the re-scheduling effort.
ASIDE: The business rules surrounding uniqueness are also handled in the business logic of the application. The use of the uniqueness constraint at the database level, however, is a fail safe that will catch unconsidered race-conditions and other bugs in the application layer.
Once it was revealed to me that deleting a prior migration attempt was no longer acceptable for the application, I had to figure out how to keep the old data around without violating the uniqueness constraint.
My first thought was to create an archive table - without a uniqueness constraint on the companyID
column - to which I would physically move the prior records. Such an approach can work well if the "archive data" is never accessed; or, is accessed in a completely different manner using a completely different set of database queries. However, in this case, the "active" and "inactive" records needed to be accessed in a uniform way. As such, trying to split the dataset would have created a lot more complexity in the application layer.
What I needed was a way to keep the old migration records in the same table without violating the uniqueness constraint. What I ended up doing was creating a nullable isCurrent
column as part of the index:
CREATE TABLE `demo_migration` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`companyID` int(10) unsigned NOT NULL,
`isCurrent` tinyint(3) unsigned DEFAULT '1',
`createdAt` datetime(6) NOT NULL,
`scheduledAt` datetime(6) NOT NULL,
`finalizedAt` datetime DEFAULT NULL,
`finalizedWithState` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `IX_byCompany` (`companyID`,`isCurrent`),
KEY `IX_byDate` (`isCurrent`,`scheduledAt`),
KEY `IX_byState` (`isCurrent`,`finalizedAt`,`scheduledAt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
NOTE: The
isCurrent
column needed to have a default value since I was adding it to an existing table with existing data.
In MySql, null
does not equal null
. As such, having two null
values in the same unique index is acceptable. What this allows me to do is have (companyID,isCurrent)
index tuples like:
(5, 1)
(5, NULL)
- Not a violation of uniqueness.(5, NULL)
- Not a violation of uniqueness.(5, NULL)
- Not a violation of uniqueness.
Now, when a scheduled migration is canceled or failed, instead of deleting the record from the table, I just change the isCurrent
column from 1
to NULL
(and I populate the newly-appended "finalized" columns with information about the state change). The uniqueness constraint now allows only a single "current migration" to be stored; but, will allow N-number of "non-current migrations" to be stored.
The one problem with this is that I now have a value in my application that is a bit "surprising". Meaning, I have a column - isCurrent
- that looks like a Boolean value by name; but, is in fact, more of a stateful value by implementation. The column isn't True
or False
, the column has states 1
and NULL
.
To minimize the leakiness of this quirk, and adhere to the Principle of Least Surprise, I'm coding the application layer to treat this value as a Boolean. Then, fully encapsulated within the data access layer, I'm performing a translation from Boolean to stateful. What this means is that my data access layer is responsible for translating True
to 1
and False
to NULL
.
Here's what that looks like on the way into the database:
<!---
CAUTION: The "isCurrent" field is playing a critical role in both the SOFT DELETE
functionality as well a the UNIQUENESS CONSTRAINT on the migration table. The only
valid values are "1" and "NULL". This is why there is some seemingly-odd handling of
the isCurrent value through-out this data access component.
--->
<cffunction name="createMigration" returntype="numeric" access="public" output="false">
<!--- Define arguments. --->
<cfargument name="companyID" type="numeric" required="true" />
<cfargument name="isCurrent" type="boolean" required="true" />
<cfargument name="createdAt" type="date" required="true" />
<cfargument name="scheduledAt" type="date" required="true" />
<cfquery name="local.results">
INSERT INTO
demo_migration
SET
companyID = <cfqueryparam value="#companyID#" cfsqltype="cf_sql_integer" />,
isCurrent = <cfqueryparam value="#booleanFormat( isCurrent )#" cfsqltype="cf_sql_tinyint" null="#( ! isCurrent )#" />,
createdAt = <cfqueryparam value="#createdAt#" cfsqltype="cf_sql_timestamp" />,
scheduledAt = <cfqueryparam value="#scheduledAt#" cfsqltype="cf_sql_timestamp" />,
finalizedAt = NULL,
finalizedWithState = NULL
;
SELECT
( @@Identity ) AS id
;
</cfquery>
<cfreturn results.id />
</cffunction>
As you can see, the consumer of this data access component (DAO), passes the isCurrent
property around as a Boolean value. Then, encapsulated within the DAO, I'm using the cfqueryparam
tag to translate that Boolean value into the appropriate stateful value needed for the unique index.
This is what that translation looks like coming out of the database:
<cffunction name="getMigrationByFilter" returntype="query" access="public" output="false">
<!--- Define arguments. --->
<cfargument name="companyID" type="numeric" required="false" />
<cfargument name="isCurrent" type="boolean" required="false" />
<cfargument name="scheduledAt" type="date" required="false" />
<cfquery name="local.results">
SELECT
m.id,
m.companyID,
COALESCE( m.isCurrent, 0 ) AS isCurrent,
m.createdAt,
m.scheduledAt,
m.finalizedAt,
m.finalizedWithState
FROM
demo_migration m
WHERE
1 = 1
<cfif structKeyExists( arguments, "companyID" )>
AND
m.companyID = <cfqueryparam value="#companyID#" cfsqltype="cf_sql_integer" />
</cfif>
<cfif structKeyExists( arguments, "isCurrent" )>
AND
m.isCurrent = <cfqueryparam value="#booleanFormat( isCurrent )#" cfsqltype="cf_sql_integer" null="#( ! isCurrent )#" />
</cfif>
<cfif structKeyExists( arguments, "scheduledAt" )>
AND
m.scheduledAt = <cfqueryparam value="#scheduledAt#" cfsqltype="cf_sql_timestamp" />
</cfif>
;
</cfquery>
<cfreturn results />
</cffunction>
Again, notice that the data access layer is using COALESCE()
(in this case) to translate the stateful value into a Boolean value as the records are being gathered and passed back up into the application layer.
Of course, after making this change, I then had to go back and add an isCurrent
check to most of my existing database queries. However, this was a fairly straightforward effort, especially considering that this internal tool that I'm building is quite small.
Over my career, I've implemented a number of "soft-delete" strategies, typically with an isDeleted
(bit) or deletedAt
(datetime) column. But, I've never had to do this within the context of a uniqueness constraint on a MySql index. I'm still building this feature; but so far, adding the nullable isCurrent
column appears to be working quite nicely.
Want to use code from this post? Check out the license.
Reader Comments
Hey Ben!
I've always used this type of soft deletes, although recently I applied a date column for that. deletedAt next to createdAt and updatedAt
I am however not entirely sure what exactly the difficulty is with this double id table.
I never liked the idea of hard deleting records, so I have actually always soft deleted stuff. Always a way to restore stuff if my application would delete stuff in error.
@Steven,
Yeah, the soft-delete as protection against the "accidental delete" is a really nice side-effect. In fact, that's something we run into a work a lot (where we don't have soft-deletes, but people delete things accidentally).
When you say:
... I am not sure what you are referring to? Are you referencing my concept of an "archive table" (where I move deleted rows to an archive table). If so, the issue that I would have run into in this case was that my "active" and "archive" items were going to be rendered in the same lists / views. So, I might link to:
some-item/{ id }
... where the
id
might refer to the "current" item or the "archived" items. As such, splitting that concept across two tables would have just complicated the code even more (in my opinion). As such, I feel like the soft-delete-esque approach made the most sense here.