Copying Data From One Table To Another Using An INNER JOIN UPDATE Query In MySQL 5.6.37
As database-driven applications evolve, so does the underlying database schema. When this happens, some degree of data transformation and migration often has to happen. For example, I just retrofitted Markdown onto 15-years of HTML content using Lucee CFML. One approach that I make use of a lot is creating an intermediary table in the database with the transformed data; and then, copying that transformed data into the destination table using an UPDATE
query that contains an INNER JOIN
condition. That said, I tend to forget the SQL syntax for this in MySQL 5.6.37; so, I wanted to put together a quick demo for future reference.
Using my Markdown-retrofitting task as a context for this MySQL exploration, imagine that I have a blog_entry
table that contains all of my article content:
mysql> SELECT * FROM blog_entry;
+----+----------+------------------+---------------------+
| id | content | content_markdown | updatedAt |
+----+----------+------------------+---------------------+
| 1 | <p>a</p> | | 1970-01-01 00:00:00 |
| 2 | <p>b</p> | | 1970-01-01 00:00:00 |
| 3 | <p>c</o> | | 1970-01-01 00:00:00 |
| 4 | <p>d</p> | | 1970-01-01 00:00:00 |
+----+----------+------------------+---------------------+
4 rows in set (0.00 sec)
In this blog_entry
table, the content
column is the HTML content for the article. The content_markdown
column is newly-added column that we want to populate with our data migration script.
Now, imagine that we have another table, blog_entry_markdown_cleanup
, which is where we've placed the Markdown content that we've generated using our ColdFusion data-transformation scripts (not shown in this demo):
mysql> SELECT * FROM blog_entry_markdown_cleanup;
+----+----------+
| id | markdown |
+----+----------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+----+----------+
4 rows in set (0.00 sec)
As you can see, in this table, the markdown
column contains the data that we want to move / copy into the content_markdown
column of our blog_entry
table. To do this, we're going to run a MySQL UPDATE
query that joins the two tables together; and then, copies the markdown content from the intermediary table into the destination table:
-- The "cleanup" table contains our retrofitted markdown content produced from our data
-- sanitization task. We want to copy / move that markdown from the cleanup table into
-- the main entry table.
UPDATE
blog_entry_markdown_cleanup c
INNER JOIN
blog_entry e
ON
e.id = c.id
SET
e.content_markdown = c.markdown,
e.updatedAt = UTC_TIMESTAMP()
-- OPTIONAL WHERE CLAUSE: Let's limit it to records that have no markdown.
WHERE
e.content_markdown = ''
;
In this MySQL query, we're joining the two tables together using an INNER JOIN
that matches the related records on the primary key id
column. Then, we copy the markdown content across tables using SET
:
e.content_markdown = c.markdown
And, after we run this MySQL UPDATE
statement with the INNER JOIN
, our blog_entry
table now looks like this:
mysql> SELECT * FROM blog_entry;
+----+----------+------------------+---------------------+
| id | content | content_markdown | updatedAt |
+----+----------+------------------+---------------------+
| 1 | <p>a</p> | a | 2020-03-24 10:43:30 |
| 2 | <p>b</p> | b | 2020-03-24 10:43:30 |
| 3 | <p>c</o> | c | 2020-03-24 10:43:30 |
| 4 | <p>d</p> | d | 2020-03-24 10:43:30 |
+----+----------+------------------+---------------------+
4 rows in set (0.00 sec)
Woot woot! As you can see, we successfully copied the markdown
column from the intermediary table into the content_markdown
column of our destination table.
Now, if you're anything like me, you love to move WHERE
conditions into ON
clauses in MySQL such that all of the filtering logic is in the same place. With a MySQL UPDATE
statement, we can do this no problem. To explore, let's reset our MySQL database tables. Here's our blog_entry
table, this time with a partially populated content_markdown
column:
mysql> SELECT * FROM blog_entry;
+----+----------+------------------+---------------------+
| id | content | content_markdown | updatedAt |
+----+----------+------------------+---------------------+
| 1 | <p>a</p> | | 2020-03-24 10:54:10 |
| 2 | <p>b</p> | | 2020-03-24 10:54:10 |
| 3 | <p>c</o> | c | 2020-03-24 10:54:10 |
| 4 | <p>d</p> | d | 2020-03-24 10:54:10 |
+----+----------+------------------+---------------------+
4 rows in set (0.00 sec)
As you can see, two of the earlier records have no Markdown; but, two of the older records do have Markdown.
And, to see the ON
clause in action, let's change our "cleanup" table to contain some visually different content:
mysql> SELECT * FROM blog_entry_markdown_cleanup;
+----+----------+
| id | markdown |
+----+----------+
| 1 | a-prime |
| 2 | b-prime |
| 3 | c-prime |
| 4 | d-prime |
+----+----------+
4 rows in set (0.00 sec)
As you can see, the Markdown content has all been suffixed with -prime
.
Ok, now let's revisit the data migration query, this time moving all of the filtering logic into a single, collocated ON
clause
-- The "cleanup" table contains our retrofitted markdown content produced from our data
-- sanitization task. We want to copy / move that markdown from the cleanup table into
-- the main entry table.
UPDATE
blog_entry_markdown_cleanup c
INNER JOIN
blog_entry e
ON
(
e.id = c.id
AND
-- NOTE: We've moved the filtering from the WHERE clause up into the ON
-- clause. This is the PREFERRED APPROACH because it keeps all of the
-- filtering logic COLLOCATED in the same portion of the query (making it
-- easier to read - in my opinion).
e.content_markdown = ''
)
SET
e.content_markdown = c.markdown,
e.updatedAt = UTC_TIMESTAMP()
;
As you can see, we've taken our WHERE
condition that prevents overwriting content and moved it into the ON
clause. Now, our MySQL UPDATE
statement will only create a cross-product of tables in which the content_markdown
column has not yet been populated.
And, after we run this MySQL UPDATE
statement, our blog_entry
table now looks like this:
mysql> SELECT * FROM blog_entry;
+----+----------+------------------+---------------------+
| id | content | content_markdown | updatedAt |
+----+----------+------------------+---------------------+
| 1 | <p>a</p> | a-prime | 2020-03-24 11:01:34 |
| 2 | <p>b</p> | b-prime | 2020-03-24 11:01:34 |
| 3 | <p>c</o> | c | 2020-03-24 10:54:10 |
| 4 | <p>d</p> | d | 2020-03-24 10:54:10 |
+----+----------+------------------+---------------------+
4 rows in set (0.00 sec)
As you can see, our UPDATE
statement only updated the first two records since these were the only records that had empty content_markdown
values. Our ON
condition worked like a charm.
Being able to use an INNER JOIN
clause in a MySQL UPDATE
query is incredibly helpful, especially for data migration and transformation scripts. It allows us to create intermediary tables that have isolated, calculated data; and then, move that calculated data from its intermediary table location over to its final table location.
Epilogue On Intermediary Tables For Data Migration / Transformation
One of the side-effects of using an intermediary table for data migration and transformation scripts is that it can be done elsewhere. Meaning, the intermediary table can be created and populated in a local development environment or in a data warehouse, where other technologies - like Lucee CFML / ColdFusion - can be used; and, where processing times and row locks don't have to be a cause-for-concern.
Then, once the intermediary table is populated, it can be copied into a production environment where the cross-table UPDATE
statement can be executed. Of course, updating an entire table in Production comes with its own performance concerns. But, that's a whole other topic.
Want to use code from this post? Check out the license.
Reader Comments
@All,
As a quick follow-up to this post, I wanted to look at using an
INNER JOIN
in a MySQLDELETE
statement:www.bennadel.com/blog/3796-deleting-data-from-a-table-using-an-inner-join-delete-query-in-mysql-5-6-37.htm
Being able to use
JOIN
constructs in both MySQLUPDATE
andDELETE
queries is just awesome 😎