Working Code Podcast - Episode 040: Are Database Transactions Overrated?
The other day, I was listening to an episode of the MongoDB podcast in which Mat Keep shared a story about the adding of atomic transactions into the MongoDB product. Mat said that the engineer who spearheaded the effort used to joke about the fact that his team was spending a huge amount of time working on a feature that 90% of developers would never need. For me - who leans heavily on transactions for referential integrity - this sounded like an crazy statement. But is it? Are database transactions overrated? Or, is it more so that the type of use-cases that work best in a document database are also the type of uses-cases that don't really need transactions?
On today's episode, the crew talks about how they use databases; the role of atomic transactions in the reduction of application complexity; and, whether or not they've ever felt "held back" by the limitations of a relational database management system. Full disclosure, all of the hosts have far more experience with traditional databases when compared to NoSQL databases.
NOTE: In the show, I mentioned that a document database like MongoDB can't enforce schemas like a relational database. And while this was true in earlier versions of the MongoDB product, it is no longer true. In recent updates, MongoDB has added schema validation and enforcement.
Here's a short teaser clip:
... featuring these beautiful, beautiful people:
- Adam Tuttle → Website, Twitter, LinkedIn
- Carol Weiler → Twitter, LinkedIn
- Tim Cunningham → Twitter, LinkedIn
- Ben Nadel (that's me) → Website, Twitter, LinkedIn
With audio editing and engineering by ZCross Media.
For the full show notes and links, visit the episode page. And, be sure to follow the show! Our website is workingcode.dev and we're @WorkingCodePod on Twitter and Instagram. Or, leave us a message at (512) 253-2633 (that's 512-253-CODE). New episodes drop weekly on Wednesday.
Reader Comments
@ben
Regarding the conversation about MongoDB transactions:
The way you phrased it, it sounded like the developer creating the transaction support for MongoDB was saying that 90% of users won't use the feature, but when you was reacting to that, you phrased it like you didn't understand why 90% of developers wouldn't need transactions. I see a big difference between "won't use" and "won't need".
I haven't looked over any of the MongoDB documentation, so please take the following with the appropriate portion of salt...I think that the difference (again based on the way that I heard you phrase it), is the difference is that most developers don't know about or know how to easily implement transactions into their workflow with MongoDB. Perhaps the easy path doesn't include using transactions, or the documentation either doesn't expose transaction early in the examples, or it's hidden away in an Advanced section that you may never "need" to delve into because you don't know you need to.
Of course, the default implementation fails so infrequently, that devs haven't been bitten enough by the pain of their database interactions failing.
One way that I think of Document databases is that they are sort of like a fully de-normalized database, you have everything that you need for an entity in one place. A discussion I've had with one of my team members about MongoDB is that for our application, that it would mean a ton of duplicated data, and their response was along the lines of "well, you could just use references to other documents", and my count-response was "congratulations, you've just invented a relational database, so why not use a database engine that's optimized for it, than one that is shoehorning it in"
We also have a few tables that have a JSON string stored in a column that's used for storing ad hoc settings for some feature. Haven't had the need just yet to make modifications such as key case changes, or structural changes to the generated object, but if we did, we'd be able to change the column type to JSON and be able to query on it, and (I assume) be able to make changes to the key case and structure using the appropriate mapping transform of the objects from the old style ot the new style.
Regarding not letting tables get wide....I feel that I keep losing the argument each time we add a column to the counter "it's just one column" even though it's just one column now, but we've maybe done it 20 times to a particular table. Maybe I need to work on my counter arguments for this issue. shrug
@Danilo,
This really hit a nerve:
I feel like this applies to so many aspects of application architecture in general!. Even with relational databases, I've (and my team) have fallen victim to this problem.
In fact, just earlier this week, someone from another team was pointing out some "dirty data" that we seemed to have in one of our tables: two records that looks exactly the same (less the primary key). And, when I took a look at the code, I could see that there was no transaction wrapping the logic that checked to see if a matching record existed before inserting the new one. Clearly, some user double-clicked a submit button and our server-side code wasn't coded to protect against the double-submission in rapid succession.
And, as you point out, in 99.99% of cases, that workflow never fails even without the transaction. So, the developer who wrote it never noticed the issue.
Now, I'm sure we could have also prevented something like this by included certain columns in a unique index. But, that's not always easy. In this case, we were trying to enforce "unique names" (to reduce user confusion). And, I don't love the idea of having to include a "name" column in an index (plus, I'm not entirely sure you can index an entire
varchar
field - I think only upto a certain size). As such, we tend to lean on transactions to help enforce that kind of business logic.@Ben,
For actions that are unlikely to have multiple submissions fractions of a second apart as part of the user's workflow, I handle the multi-click submission by disabling or replacing the button/link once it has been clicked (showing some sort of processing animation as well should the event take longer than a couple of hundred milliseconds). Also, if appropriate to the workflow, consider clearing the the content that gets submitted so that there is no content to send so the validation code can also help prevent duplicate submissions. If you're using optimistic processing, move to the assumed success state and repopulate the content on an unsuccessful submission.
And you still have to handle things in the API layer, as well as the database.
Best of luck in your new adventures you discussed in episode 039
@Danilo,
All excellent suggestions 💪
@All,
On Twitter, Mat Keep pointed me to a whitepaper he wrote on MongoDB transactions. In it, there is some discussion on how powerful transactions are in a MongoDB-based application:
This makes me feel that perhaps we are all more on the same page than I originally had thought. As I was saying to Mat on Twitter, the vast majority of my SQL queries don't have transactions; but, the critical ones do because I like to lean on the database for enforcement of certain constraints. I think what I was reacting to was the idea that "90% of developers don't need transactions", vs something that might be more palatable, "90% of queries don't need transaction".