Skip to main content
Ben Nadel at InVision In Real Life (IRL) 2019 (Phoenix, AZ) with: Gabriel Zeck
Ben Nadel at InVision In Real Life (IRL) 2019 (Phoenix, AZ) with: Gabriel Zeck

Copying Data From One Table To Another Using An INNER JOIN UPDATE Query In MySQL 5.6.37

By
Published in Comments (1)

As database-driven applications evolve, so does the underlying database schema. When this happens, some degree of data transformation and migration often has to happen. For example, I just retrofitted Markdown onto 15-years of HTML content using Lucee CFML. One approach that I make use of a lot is creating an intermediary table in the database with the transformed data; and then, copying that transformed data into the destination table using an UPDATE query that contains an INNER JOIN condition. That said, I tend to forget the SQL syntax for this in MySQL 5.6.37; so, I wanted to put together a quick demo for future reference.

Using my Markdown-retrofitting task as a context for this MySQL exploration, imagine that I have a blog_entry table that contains all of my article content:

mysql> SELECT * FROM blog_entry;
+----+----------+------------------+---------------------+
| id | content  | content_markdown | updatedAt           |
+----+----------+------------------+---------------------+
|  1 | <p>a</p> |                  | 1970-01-01 00:00:00 |
|  2 | <p>b</p> |                  | 1970-01-01 00:00:00 |
|  3 | <p>c</o> |                  | 1970-01-01 00:00:00 |
|  4 | <p>d</p> |                  | 1970-01-01 00:00:00 |
+----+----------+------------------+---------------------+
4 rows in set (0.00 sec)

In this blog_entry table, the content column is the HTML content for the article. The content_markdown column is newly-added column that we want to populate with our data migration script.

Now, imagine that we have another table, blog_entry_markdown_cleanup, which is where we've placed the Markdown content that we've generated using our ColdFusion data-transformation scripts (not shown in this demo):

mysql> SELECT * FROM blog_entry_markdown_cleanup;
+----+----------+
| id | markdown |
+----+----------+
|  1 | a        |
|  2 | b        |
|  3 | c        |
|  4 | d        |
+----+----------+
4 rows in set (0.00 sec)

As you can see, in this table, the markdown column contains the data that we want to move / copy into the content_markdown column of our blog_entry table. To do this, we're going to run a MySQL UPDATE query that joins the two tables together; and then, copies the markdown content from the intermediary table into the destination table:

-- The "cleanup" table contains our retrofitted markdown content produced from our data
-- sanitization task. We want to copy / move that markdown from the cleanup table into
-- the main entry table.
UPDATE
	blog_entry_markdown_cleanup c
INNER JOIN
	blog_entry e
ON
	e.id = c.id
SET
	e.content_markdown = c.markdown,
	e.updatedAt = UTC_TIMESTAMP()

-- OPTIONAL WHERE CLAUSE: Let's limit it to records that have no markdown.
WHERE
	e.content_markdown = ''
;

In this MySQL query, we're joining the two tables together using an INNER JOIN that matches the related records on the primary key id column. Then, we copy the markdown content across tables using SET:

e.content_markdown = c.markdown

And, after we run this MySQL UPDATE statement with the INNER JOIN, our blog_entry table now looks like this:

mysql> SELECT * FROM blog_entry;
+----+----------+------------------+---------------------+
| id | content  | content_markdown | updatedAt           |
+----+----------+------------------+---------------------+
|  1 | <p>a</p> | a                | 2020-03-24 10:43:30 |
|  2 | <p>b</p> | b                | 2020-03-24 10:43:30 |
|  3 | <p>c</o> | c                | 2020-03-24 10:43:30 |
|  4 | <p>d</p> | d                | 2020-03-24 10:43:30 |
+----+----------+------------------+---------------------+
4 rows in set (0.00 sec)

Woot woot! As you can see, we successfully copied the markdown column from the intermediary table into the content_markdown column of our destination table.

Now, if you're anything like me, you love to move WHERE conditions into ON clauses in MySQL such that all of the filtering logic is in the same place. With a MySQL UPDATE statement, we can do this no problem. To explore, let's reset our MySQL database tables. Here's our blog_entry table, this time with a partially populated content_markdown column:

mysql> SELECT * FROM blog_entry;
+----+----------+------------------+---------------------+
| id | content  | content_markdown | updatedAt           |
+----+----------+------------------+---------------------+
|  1 | <p>a</p> |                  | 2020-03-24 10:54:10 |
|  2 | <p>b</p> |                  | 2020-03-24 10:54:10 |
|  3 | <p>c</o> | c                | 2020-03-24 10:54:10 |
|  4 | <p>d</p> | d                | 2020-03-24 10:54:10 |
+----+----------+------------------+---------------------+
4 rows in set (0.00 sec)

As you can see, two of the earlier records have no Markdown; but, two of the older records do have Markdown.

And, to see the ON clause in action, let's change our "cleanup" table to contain some visually different content:

mysql> SELECT * FROM blog_entry_markdown_cleanup;
+----+----------+
| id | markdown |
+----+----------+
|  1 | a-prime  |
|  2 | b-prime  |
|  3 | c-prime  |
|  4 | d-prime  |
+----+----------+
4 rows in set (0.00 sec)

As you can see, the Markdown content has all been suffixed with -prime.

Ok, now let's revisit the data migration query, this time moving all of the filtering logic into a single, collocated ON clause

-- The "cleanup" table contains our retrofitted markdown content produced from our data
-- sanitization task. We want to copy / move that markdown from the cleanup table into
-- the main entry table.
UPDATE
	blog_entry_markdown_cleanup c
INNER JOIN
	blog_entry e
ON
	(
			e.id = c.id
		AND
			-- NOTE: We've moved the filtering from the WHERE clause up into the ON
			-- clause. This is the PREFERRED APPROACH because it keeps all of the
			-- filtering logic COLLOCATED in the same portion of the query (making it
			-- easier to read - in my opinion).
			e.content_markdown = ''
	)
SET
	e.content_markdown = c.markdown,
	e.updatedAt = UTC_TIMESTAMP()
;

As you can see, we've taken our WHERE condition that prevents overwriting content and moved it into the ON clause. Now, our MySQL UPDATE statement will only create a cross-product of tables in which the content_markdown column has not yet been populated.

And, after we run this MySQL UPDATE statement, our blog_entry table now looks like this:

mysql> SELECT * FROM blog_entry;
+----+----------+------------------+---------------------+
| id | content  | content_markdown | updatedAt           |
+----+----------+------------------+---------------------+
|  1 | <p>a</p> | a-prime          | 2020-03-24 11:01:34 |
|  2 | <p>b</p> | b-prime          | 2020-03-24 11:01:34 |
|  3 | <p>c</o> | c                | 2020-03-24 10:54:10 |
|  4 | <p>d</p> | d                | 2020-03-24 10:54:10 |
+----+----------+------------------+---------------------+
4 rows in set (0.00 sec)

As you can see, our UPDATE statement only updated the first two records since these were the only records that had empty content_markdown values. Our ON condition worked like a charm.

Being able to use an INNER JOIN clause in a MySQL UPDATE query is incredibly helpful, especially for data migration and transformation scripts. It allows us to create intermediary tables that have isolated, calculated data; and then, move that calculated data from its intermediary table location over to its final table location.

Epilogue On Intermediary Tables For Data Migration / Transformation

One of the side-effects of using an intermediary table for data migration and transformation scripts is that it can be done elsewhere. Meaning, the intermediary table can be created and populated in a local development environment or in a data warehouse, where other technologies - like Lucee CFML / ColdFusion - can be used; and, where processing times and row locks don't have to be a cause-for-concern.

Then, once the intermediary table is populated, it can be copied into a production environment where the cross-table UPDATE statement can be executed. Of course, updating an entire table in Production comes with its own performance concerns. But, that's a whole other topic.

Want to use code from this post? Check out the license.

Reader Comments

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel