Exploring Type-Coercion And Value Comparisons In A JSON Column In MySQL 5.7.32
Generally speaking, SQL is pretty lenient when it comes to simple types and value comparisons. Meaning, from a SQL execution standpoint, 1
and "1"
are equal because the SQL engine will coerce the values as needed (much like ColdFusion). However, document databases like MongoDB are much less lenient and will not cast values on-the-fly. This got me thinking about the new JSON
support in MySQL 5.7: will simple values get coerced when comparing an input to a JSON
path?
CAUTION: While MySQL will happily coerce values on-the-fly during a comparison, note that this does have implications. The most critical of which is that type coercion will bypass index selection during query planning which may result in a full-table scan.
When it comes to the JSON
data structures in MySQL 5.7, you can "extract" a value using JSON_EXTRACT()
(or ->
); and then, you can "unquote" an extracted value using JSON_UNQUOTE()
(or ->>
). The latter of these two methods will remove the quotes (""
) from an extracted String, which may or may not change the actual data-type of the resolved value. Before digging into value comparisons, it wasn't clear to me how all of this would come together.
To test, we can create an in-memory JSON
value using JSON_OBJECT()
. Then, we can just run a number of comparisons against String and Number values:
SET @json = JSON_OBJECT( 'string', '1', 'number', 1 );
SELECT
-- Control comparisons across types.
( 1 = 1 AND 1 = '1' AND '1' = '1' ),
-- Test JSON String against various inputs.
JSON_EXTRACT( @json, '$.string' ) = '1',
JSON_EXTRACT( @json, '$.string' ) = 1,
JSON_UNQUOTE( JSON_EXTRACT( @json, '$.string' ) ) = '1',
JSON_UNQUOTE( JSON_EXTRACT( @json, '$.string' ) ) = 1,
-- Test JSON Number against various inputs.
JSON_EXTRACT( @json, '$.number' ) = '1',
JSON_EXTRACT( @json, '$.number' ) = 1,
JSON_UNQUOTE( JSON_EXTRACT( @json, '$.number' ) ) = '1',
JSON_UNQUOTE( JSON_EXTRACT( @json, '$.number' ) ) = 1
;
As you can see, we have a JSON
payload that contains two keys: one that points to a "numeric" 1
and one that points to a "string" "1"
. Then, we proceed to compare those two values to both string and numeric inputs. And, when we run this SQL query in MySQL 5.7.32, we get the following output:
As you can see, everything came back as TRUE
except these two comparisons:
JSON_EXTRACT( @json, '$.string' ) = 1
JSON_EXTRACT( @json, '$.number' ) = '1'
When comparing an input to a JSON
value, MySQL does not coerce the operands. Meaning, in the context of a JSON
object, 1
and "1"
are not the equal. However, what we can see is that when we "unwrap" the JSON
value using JSON_UNQUOTE()
, MySQL will fall-back to coercing strings and numbers on-the-fly.
JSON Columns: Garbage In, Garbage Out
One of the splendiferous features of a relational database table is that is enforces a schema at the storage level. Which means, if you have an INT
column, your input is going to be stored as a number regardless of how you provide said value (1
vs "1"
). A JSON
data structure - in MySQL 5.7.32 - does not make any such guarantees. Beyond ensuring that said payload is, in fact, valid JSON (JavaScript Object Notation), the database makes no judgments about what is actually in said JSON
payload.
Which means, garbage in is garbage out. In other words, it's up to you and your application layer to make sure all inputs to a JSON
structure are consistent. Any mess-up and it means you're not going to get the expected query results. To see what I mean, let's look at this SQL which accidentally commingles a String value amongst a bunch of Numbers:
SELECT
COUNT( * )
FROM
(
SELECT JSON_OBJECT( 'value', 1 ) AS payload UNION ALL
SELECT JSON_OBJECT( 'value', 1 ) AS payload UNION ALL
SELECT JSON_OBJECT( 'value', '1' ) AS payload UNION ALL -- OOPS!!!!
SELECT JSON_OBJECT( 'value', 1 ) AS payload
) AS derived
WHERE
derived.payload->'$.value' = 1
;
As you can see, we have 4 rows that contain a .value
property. However, when we go to query for the rows in which the .value
property is 1
, we get the following COUNT(*)
:
3
We're missing a row - the one in which someone accidentally stored the .value
property as a String.
ASIDE: As we saw in the previous section, MySQL will fall-back to coercing values on-the-fly if we "unwrap" the
JSON
values. Which means, we'll get back all 4 rows if we changepayload->'$.value'
topayload->>'$.value'
, which is a short-hand syntax for theJSON_UNQUOTE(JSON_EXTRACT())
combination.
With great flexibility comes great responsibility. When you opt-out of the strict schema provided by a relational database, it means that the onus of enforcement falls upon the developer. You now become the one responsible for data integrity. Proceed with caution!
Want to use code from this post? Check out the license.
Reader Comments