It's OK To Be Super Explicit In Your Data Access Method Names
This post is mostly a note to myself - a sort of "permission slip" to relax the previous constraints of my mind. The other day, I was writing an Update method in a data access object (DAO) within my ColdFusion application. This update method was updating an optional relationship within my domain model. Meaning, the target row wasn't guaranteed to exist. As such, I decided to author the underlying SQL statement as an Upsert method that would use the ON DUPLICATE KEY UPDATE
semantics. This would allow me to either update the existing row or create a new row on-the-fly. Technically, this worked like a charm; but, emotionally, I felt deeply conflicted.
ASIDE: In the following discussion, I am using "Thing" as a placeholder for any arbitrary domain concept.
The problem I was "feeling" was that the method was called updateThing()
. Now, there's nothing wrong with calling a method updateThing()
- I use that form of method name all the time in my data access layer. However, with all my other "update" methods, most of the arguments are optional; and, a given column is only updated if the relevant argument is provided. As such, my CFQuery
UPDATE
SQL statements usually look a lot like this (truncated):
component {
public void function updateThing(
required numeric id,
string valueOne,
string valueTwo,
string valueThree
) {
```
<cfquery name="local.results" result="local.metaResults">
UPDATE
thing
SET
<cfif arguments.keyExists( "valueOne" )>
valueOne = <cfqueryparam value="#valueOne#" sqltype="varchar" />,
</cfif>
<cfif arguments.keyExists( "valueTwo" )>
valueTwo = <cfqueryparam value="#valueTwo#" sqltype="varchar" />,
</cfif>
<cfif arguments.keyExists( "valueThree" )>
valueThree = <cfqueryparam value="#valueThree#" sqltype="varchar" />,
</cfif>
/* No-op assignment here to make the trailing commas consistent. */
id = <cfqueryparam value="#id#" sqltype="bigint" />
WHERE
id = <cfqueryparam value="#id#" sqltype="bigint" />
</cfquery>
```
}
}
With an Upsert statement, however, the arguments can't be optional because they need to be able to create a new, fully-formed row if the target row doesn't exist. As such, all Upsert method arguments have to be marked as required
.
And that's where my brain started to explode. I couldn't, in good conscience, author an update()
method that completely broke from the standard pattern of optional arguments. It was too much of a deviation from any reasonable expectation that another developer would have when consuming this method.
The name of the method wasn't telegraphing the intent of underlying query.
So my brain finally said to me: THEN NAME IT SOMETHING ELSE you big doofus!
And so, I changed it to upsertThing()
. And my brain said, Yay! Because, now, the method was doing the thing that it said it was doing and it became clear as to why all the arguments were marked required
.
component {
public void function upsertThing(
required numeric ownerID,
required string valueOne,
required string valueTwo,
required string valueThree
) {
```
<cfquery name="local.results" result="local.metaResults">
INSERT INTO
thing
SET
/* !!! Owner ID column might cause a key-conflict. !!! */
ownerID = <cfqueryparam value="#ownerID#" sqltype="bigint" />,
valueOne = <cfqueryparam value="#valueOne#" sqltype="varchar" />,
valueTwo = <cfqueryparam value="#valueTwo#" sqltype="varchar" />,
valueThree = <cfqueryparam value="#valueThree#" sqltype="varchar" />
ON DUPLICATE KEY UPDATE
valueOne = <cfqueryparam value="#valueOne#" sqltype="varchar" />,
valueTwo = <cfqueryparam value="#valueTwo#" sqltype="varchar" />,
valueThree = <cfqueryparam value="#valueThree#" sqltype="varchar" />
</cfquery>
```
}
}
This was an important moment of clarity for me because, normally, my brain craves consistency. As such, I wanted this data access method to be named the same thing as other data access methods. But, it's not the same thing. And that's OK.
What I'm seeing now is that data access methods that mutate data should never be surprising. They should say exactly what they are going to do; and then do only what's printed on the tin.
So, for example, in my previous post on handling MySQL key conflicts in Lucee CFML, I had a method that would either INSERT
a new row or return the id
(primary key) of an existing row:
component {
public numeric function createThing( required string value ) {
```
<cfquery name="local.results" result="local.metaResults">
INSERT INTO
thing
SET
value = <cfqueryparam value="#value#" sqltype="varchar" />
ON DUPLICATE KEY UPDATE
id = LAST_INSERT_ID( id ) -- Store existing ID into "generatedKey".
;
</cfquery>
```
return( metaResults.generatedKey );
}
}
The problem with this method, as I now see it, is that it's not telling me what it does. It's not "creating a Thing" - it's "creating a Thing or returning the id
of the existing Thing". As such, it should be renamed to literally say as much:
createThingOrReturnExistingID()
It's a bit of a mouth-full, but so what? I already firmly believe that your data access layer should have as many methods as you find helpful. So, there's nothing inherently wrong with having a few more "create" methods that are more explicit in how they differentiate from one another.
This is a personal revelation, but it's certainly not a new idea. If you look in the MongoDB Java API, for example, they have all sorts of methods that perform very specific actions: updateOne()
, updateMany()
, findAndModify()
, findOneAndUpdate()
, findOneAndReplace()
, bulkWrite()
, etc. So, there's much precedence for being explicit; and, my brain is finally getting on the band-wagon.
Want to use code from this post? Check out the license.
Reader Comments
I've added upsert to my internal dictionary. Brilliant. I completely agree with your strategy here. Say what it does. Always need to think about someone else reading your code. Naming is the hardest part of coding!
@Dave,
Agreed, Upsert is a great word - I learned that from the MongoDB API. Naming things is hard. And, the funny thing is, sometimes it's deceptively hard! Like, you'll name something and feel good about it; and then, only over time do you realize that you failed to capture what it actually was and why working with that "thing" feels overly complicated. Such a journey!
Totally! I have many "What was I thinking calling it that!" moments a few years later. And then there's trying to find a Font Awesome icon to match the piece of functionality ;)
On aside, loving the "Working Code" podcast. I listen to it whilst running in the hills here in New Zealand. Keep it up!
@Dave,
Don't even get me started about icons 😜 I spent like 3-hours the other day trying to make an icon in Sketch. At work, all our icons have a
1.5px
stroke, which makes it super hard to design since all the paths needs to start on a.5px
location. I feel like that, alone, is a decision somebody probably regrets at this point!Re: Working Code, thank you for the kinds words! It's been a lot of fun. Glad it's making it all the way over there to the other side of the world. Running up "hills" reminds me of t-shirt that I used to see somewhere that said,
Hill Yis
, which was - I think - supposed to be someone with a New Zealand accent saying "Hell Yes". But, that has nothing to do with anything. Happy Friday! (or maybe it's already Saturday for you).Assuming this is MySQL, I prefer to use the VALUES function on the update part.
Apart from being more readable, it removes the possibility that I make a change to the insert part and forget to change the update (or vice versa)
so e.g.
@Ian,
Yes, this is MySQL. It's funny, so I've actually read through the MySQL documentation on the
ON DUPLICATE KEY UPDATE
, and I've seen theVALUES()
stuff, and I swear I've just never really understood what it was doing. Even just now, as I was about to write this comment, I went and I looked at the documentation again, and for whatever reason, my mind just glazes over 😨I think maybe this is most useful in the multi-row insert. Here's an example in the docs -- and forgive me, I'm mostly talking to myself at this point :
... but, I guess this would not work if you just did
c=a+b
? I'm not sure. I honestly only ever really insert one row at a time. I don't have a good mental model for multi-row inserts.Anyway, apologies for the stream-of-conscience; just trying to wrap my head around it.
@Ben,
That's it right there. I don't know why they've gone immediately to an example of a non-obvious use for this (e.g. the single insert version of your statement I posted), but I suspect that immediate dive in to the edgy/hard to grasp is the problem you're having. I've never used them in any other way than as 'little bit of DRY plus legibility' improvement.
Just seems to me like an obvious requirement of the ON DUPLICATE KEY clause to prevent unnecessary/mistaken duplication of values e.g. Postgres has it with ON CONFLICT/EXCLUDED
It's a bit less of an improvement for me as I nearly always use queryExecute, so I could just repeat the same param name in the insert and update clauses. But I think it's optimal to defer to the db language options when building queries.
@Ian,
Sorry, I promise that I'm not being purposefully obtuse here. But, I appear to be able to refer to the existing row values even without that function. Meaning, in my example in the post, I have:
... where the
id
is from the existing row. So, I am not sure what is different if I just addVALUES()
around it:Maybe it's just that the
VALUES()
is what is recommended by the language? And, not necessarily an actual semantic difference? That's what I mean - I am not sure what it is technically doing.@Ben,
I reckon the disconnect here is you think I'm trying to 'improve' your final query, the one that gets the id in the event that the upserts ends up being an update. I'm not - it's about the first upsert
I'm trying to fix the issue that, using your original upsert example, you have to ensure you have exactly matching pairs of cfqueryparams - one for the INSERT portion, one for the UPDATE. Not particularly onerous for tables with a small number of columns, but for larger tables there is a danger you may mismatch columns, or later on update the INSERT part of you query and forget the UPDATE etc.
The VALUES() function allows me to simply say, for any column
It has the added benefit (for me) of getting me closer to a 'truer'/more portable/easier to read SQL statement, with minimal CF footprint.
So, in other words, I much prefer this version of your original upsert statement;
@Ian,
Ooooooh! I see what you're saying now - the
VALUES()
function is for the value I was going to insert, not the value that was already in the row. That's the part that wasn't clicking for me. For whatever reason, my brain just was not getting that from the documentation.@Ben,
Yes!
...and the reason you're not getting it from the docs (IMO) is they've gone straight to a slightly obfuscatory, non-obvious example. Whereas way back when I found VALUES() I came to it from a 'there must be a way for me to stop repeating all these params cos it's dangerous/tedious', so could gloss over the esoteric stuff. TBF they're entirely in the 'RDBS' domain, and not concerned with param based queries being run from outside MySQL
I hope now my comments about DRY/readability/portability/deference to underlying db language make sense!
@Ian,
Yes, thank you so much for your patience in hand-holding through to this realization. Much appreciated 🙌
@Ian,
I just used the
VALUES(col)
in a production query. Woot woot 💥@Ben,
Sweet! 🤓
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →