Parsing File Paths As Slash-Delimited Lists In MySQL
I've been working on populating a temporary MySQL database table that contains file paths of files that need to be deleted. And, since deleting data is always terrifying, I wanted to write a validation SQL statement that would take the files in the temporary table and then cross-reference them against other tables in the database. In order to do this, I needed to parse the file paths and extract various path segments. And, by using a combination of calls to SUBSTRING_INDEX()
, I was able to treat each file path as a delimited list in MYSQL.
Imagine that I have a file path stored in the form of:
/path/to/images/1234.png
In order to cross-reference this file path with another database table, I need to extract the ID of the parent entity - 1234
in this case - and the filename, which is 1234.png
. If you squint your eyes, you can think of a file path as being a set of nested, delimited lists. The outer list is a series of segments delimited by (/
). And, the filename is an inner list which contains a series of segments delimited by (.
).
Unfortunately, MySQL doesn't have the powerful list functions that ColdFusion has. But, one of the functions that it does have is SUBSTRING_INDEX()
. This function will extract prefixes and suffixes based on a given delimiter.
From the documentation:
SUBSTRING_INDEX( str, delim, count )
Returns the substring from string
str
beforecount
occurrences of the delimiterdelim
. Ifcount
is positive, everything to the left of the final delimiter (counting from the left) is returned. Ifcount
is negative, everything to the right of the final delimiter (counting from the right) is returned.SUBSTRING_INDEX()
performs a case-sensitive match when searching fordelim
.
So, basically, if the count is positive, it returns a prefix; and, if the count is negative, it returns a suffix. Let's try applying this to our file path. Here's extracting prefixes from the path:
SET @path = '/path/to/images/12345.png';
SELECT
( @path ) AS path,
SUBSTRING_INDEX( @path, '/', 1 ) AS segment_1,
SUBSTRING_INDEX( @path, '/', 2 ) AS segment_2,
SUBSTRING_INDEX( @path, '/', 3 ) AS segment_3,
SUBSTRING_INDEX( @path, '/', 4 ) AS segment_4,
SUBSTRING_INDEX( @path, '/', 5 ) AS segment_5
;
As you can see, we're using the /
as the path delimiter. And, when we run this SQL, we get the following output:
mysql> SELECT
-> ( @path ) AS path,
-> SUBSTRING_INDEX( @path, '/', 1 ) AS segment_1,
-> SUBSTRING_INDEX( @path, '/', 2 ) AS segment_2,
-> SUBSTRING_INDEX( @path, '/', 3 ) AS segment_3,
-> SUBSTRING_INDEX( @path, '/', 4 ) AS segment_4,
-> SUBSTRING_INDEX( @path, '/', 5 ) AS segment_5
-> \G
*************************** 1. row ***************************
path: /path/to/images/12345.png
segment_1:
segment_2: /path
segment_3: /path/to
segment_4: /path/to/images
segment_5: /path/to/images/12345.png
As you can see, as the count SUBSTRING_INDEX()
count increases, more of our leading path segments are included in the result.
Now, let's try extracting suffixes from the path:
SET @path = '/path/to/images/12345.png';
SELECT
( @path ) AS path,
SUBSTRING_INDEX( @path, '/', -1 ) AS segment_negative_1,
SUBSTRING_INDEX( @path, '/', -2 ) AS segment_negative_2,
SUBSTRING_INDEX( @path, '/', -3 ) AS segment_negative_3,
SUBSTRING_INDEX( @path, '/', -4 ) AS segment_negative_4,
SUBSTRING_INDEX( @path, '/', -5 ) AS segment_negative_5
;
Again, we're treating the path as a /
delimited list; but, this time we're using negative counts. This SQL gives us the following:
mysql> SELECT
-> ( @path ) AS path,
-> SUBSTRING_INDEX( @path, '/', -1 ) AS segment_negative_1,
-> SUBSTRING_INDEX( @path, '/', -2 ) AS segment_negative_2,
-> SUBSTRING_INDEX( @path, '/', -3 ) AS segment_negative_3,
-> SUBSTRING_INDEX( @path, '/', -4 ) AS segment_negative_4,
-> SUBSTRING_INDEX( @path, '/', -5 ) AS segment_negative_5
-> \G
*************************** 1. row ***************************
path: /path/to/images/12345.png
segment_negative_1: 12345.png
segment_negative_2: images/12345.png
segment_negative_3: to/images/12345.png
segment_negative_4: path/to/images/12345.png
segment_negative_5: /path/to/images/12345.png
As you can see, as the count SUBSTRING_INDEX()
count increases (negatively), more of our trailing path segments are included in the result.
And, with the first result above being the filename in our path, we can really start to see how we can get at the low-level data by passing the result of one SUBSTRING_INDEX()
call in as the input for another SUBSTRING_INDEX()
call:
SET @path = '/path/to/images/12345.png';
SELECT
( @path ) AS path,
SUBSTRING_INDEX( @path, '/', -1 ) AS filename,
SUBSTRING_INDEX( SUBSTRING_INDEX( @path, '/', -1 ), '.', 1 ) AS id,
SUBSTRING_INDEX( SUBSTRING_INDEX( @path, '/', -1 ), '.', -1 ) AS ext
;
In this case, we're extracting the filename from the outer /
delimited list. And then, we're treating the filename as a .
delimited list. So, when we run this SQL, we get the following output:
mysql> SELECT
-> ( @path ) AS path,
-> SUBSTRING_INDEX( @path, '/', -1 ) AS filename,
-> SUBSTRING_INDEX( SUBSTRING_INDEX( @path, '/', -1 ), '.', 1 ) AS id,
-> SUBSTRING_INDEX( SUBSTRING_INDEX( @path, '/', -1 ), '.', -1 ) AS ext
-> \G
*************************** 1. row ***************************
path: /path/to/images/12345.png
filename: 12345.png
id: 12345
ext: png
As you can see, by using a series of nested SUBSTRING_INDEX()
calls, we were able to extract the filename (12345.png
), the ID (12345
), and the file extension (.png
).
These values can then be used in an INNER JOIN
clause for cross-table validation. Here's some pseudo-code for what I was trying to do - imagine that my clean_up
table contains file paths relating to the widget
table:
-- We want to VALIDATE that we're not about to delete REAL DATA. As such, let's
-- check to make sure that the clean_up records have NO CORRESPONDING ROWS in
-- the widget table. We want this query to return ZERO RECORDS!!!! If it
-- returns any records, it means we're about to delete active data.
SELECT
c.id,
c.path
FROM
clean_up c
INNER JOIN
widget w
ON
(
-- Match the widget on the ID extracted from the path.
w.id = SUBSTRING_INDEX( SUBSTRING_INDEX( c.path, '/', -1 ), '.', 1 )
AND
-- Make sure the filenames match as well.
w.imageFilename = SUBSTRING_INDEX( c.path, '/', -1 )
)
;
Most of my day-to-day work involves straightforward queries for data being displayed to the user. But sometimes, in maintenance and migration scripts, things get tricky. And, it's nice to know that MySQL has some affordance for treating strings as lists.
Want to use code from this post? Check out the license.
Reader Comments
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →