Skip to main content
Ben Nadel at CF Summit West 2024 (Las Vegas) with: Kishore Balakrishnan
Ben Nadel at CF Summit West 2024 (Las Vegas) with: Kishore Balakrishnan

Showing The Affects Of Database Index Structure On Membership Table JOIN Performance In MySQL 5.6.37

By
Published in Comments (8)

At InVision, a couple of the engineers on my team are building a new feature that will entail a "membership" pattern of data organization. Meaning, there will be "containers"; and then, "items" within those "containers". Yesterday, we began to noodle on the index design of those database tables; and, while I discussed this type of relationship in my Not-So-Dark Art of Designing Database Indexes article, I did leave the low-level details a bit hand-wavy. As such, I thought it would be fun to take a closer look at how a "membership" style table can be indexed; and, what the EXPLAIN output of the table is along the way in MySQL 5.6.37.

To create a demo context for this, imagine that I have three tables:

  • club
  • club_membership
  • user

The club table is the "container" in which the user records will be "contained". And, the club_membership table is the "join" table (so to speak) that defines exactly which user records live inside which club records. The tables - prior to any relevant indexing - look like this:

CREATE TABLE `club` (
	`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
	`name` varchar(50) NOT NULL,
	PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `club_membership` (
	`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
	`clubID` int(10) unsigned NOT NULL,
	`userID` int(10) unsigned NOT NULL,
	`startedAt` datetime NOT NULL,
	`endedAt` datetime DEFAULT NULL,
	`isFavorite` tinyint(1) unsigned NOT NULL,
	PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `user` (
	`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
	`name` varchar(50) NOT NULL,
	`email` varchar(75) NOT NULL,
	PRIMARY KEY (`id`),
	UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

As you can see, the club_membership table associates club and user records using the date-delimiters, startedAt and endedAt. For a club membership to be active, the endedAt column will be set to NULL.

The club_membership table will be used to facilitate three basic query patterns without our demo application:

  1. Show me all active members that are in a given club.

  2. Show me all clubs in which a given user is an active member.

  3. Check to see if a given user is an active member of a given club.

Since none of these questions can be answered using the club_membership's primary key, it means that we will have to use secondary indexes to drive these look-ups. Let's look at each of these different use-cases.

Show Me All Active Members That Are In A Given Club

To represent this use-case, I'm going to use the following SQL query:

EXPLAIN
SELECT
	u.id,
	u.name
FROM
	club_membership cm
INNER JOIN
	club c
ON
	(
			cm.clubID = 72 -- The club (container) in question.
		AND
			cm.endedAt IS NULL -- Active memberships.
		AND
			c.id = cm.clubID
	)
INNER JOIN
	user u
ON
	u.id = cm.userID -- The users (items) within the club (container).
;

As I've written before, I structure my JOIN queries using an exacting methodology that places the "what I have" data on right-side of all comparisons and the "what I need" data on the left-side of all comparisons. I highly, highly, highly recommend this approach, as it brings a consistency to your SQL queries that enhances the readability; and, as we're about to see, makes designing indexes much more obvious.

If we run the above EXPLAIN statement before we add any indexes to the club_membership table, we get the following output:

mysql> EXPLAIN -- .... (truncated) .... no indexes yet.
   -> ;
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+-------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                   | rows  | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+-------+-------------+
|  1 | SIMPLE      | c     | const  | PRIMARY       | PRIMARY | 4       | const                 |     1 | Using index |
|  1 | SIMPLE      | cm    | ALL    | NULL          | NULL    | NULL    | NULL                  | 19224 | Using where |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY | 4       | invisionapp.cm.userID |     1 | NULL        |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+-------+-------------+
3 rows in set (0.00 sec)

As you can see, in order to execute this query, the database would need to scan 19,224 records from the club_membership table. There are no keys available for this query. And, the Using where in the Extra column indicates that it will be doing a full-table scan.

Obviously, we almost never want a full-table scan; so, let's start adding an index. As I said above, my JOIN statements are structured using a "what I have" / "what I need" organization. And, the first "what I have" is the id of the club we are looking-up. So, let's create an index that starts with the "what I need", clubID:

ALTER TABLE `club_membership`
	ADD INDEX `IX_byClub` (`clubID`) USING BTREE
;

With this index, if we re-run the EXPLAIN we get the following output:

mysql> EXPLAIN -- .... (truncated) .... index: (`clubID`).
   -> ;
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key       | key_len | ref                   | rows | Extra       |
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+-------------+
|  1 | SIMPLE      | c     | const  | PRIMARY       | PRIMARY   | 4       | const                 |    1 | Using index |
|  1 | SIMPLE      | cm    | ref    | IX_byClub     | IX_byClub | 4       | const                 |   18 | Using where |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY   | 4       | invisionapp.cm.userID |    1 | NULL        |
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+-------------+
3 rows in set (0.00 sec)

Awesome! This time, the database is using our key, IX_byClub, and the number of rows that it will have to scan dropped from 19,224 records down to 18 records! That's a massive improvement. But, the Extra column still has Using where, which means that the database is still reading both indexed data (from the key IX_byClub) and non-indexed data in order to satisfy the INNER JOIN on club_membership.

The next "what I have" / "what I need" question in the JOIN is for endedAt IS NULL. So, let's try adding endedAt to the index:

ALTER TABLE `club_membership`
	DROP INDEX `IX_byClub`,
	ADD INDEX `IX_byClub` (`clubID`, `endedAt`) USING BTREE
;

With this index, if we re-run the EXPLAIN we get the following output:

mysql> EXPLAIN -- .... (truncated) .... index: (`clubID`, `endedAt`).
   -> ;
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+-----------------------+
| id | select_type | table | type   | possible_keys | key       | key_len | ref                   | rows | Extra                 |
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+-----------------------+
|  1 | SIMPLE      | c     | const  | PRIMARY       | PRIMARY   | 4       | const                 |    1 | Using index           |
|  1 | SIMPLE      | cm    | ref    | IX_byClub     | IX_byClub | 10      | const,const           |    7 | Using index condition |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY   | 4       | invisionapp.cm.userID |    1 | NULL                  |
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+-----------------------+
3 rows in set (0.00 sec)

Getting better! Now that we've added endedAt to the index, the MySQL database will only have to scan 7-records instead of 18-records. This is because we've built the "active membership" concept into the index.

Our Extra column has also changed to Using index condition. This means that the MySQL database is using the index to filter the tables in the first INNER JOIN; but then, has to read the entire row in order to process the second INNER JOIN. This isn't inherently bad - our performance at this point will be pretty solid. But, we can do better!

NOTE: I am not good at clearly articulating what the Extra column values mean. This is because I don't have a solid mental model for it. And, I usually have to look the values up when I'm confused. Unfortunately, the MySQL documentation regarding the Extra column rarely adds much clarity :D As such, please bare-with my less-than-stellar explanation of how our Extra column readings evolve through this article.

At this point, our index contains all of the "what I need" / "what I have" conditions to satisfy the first INNER JOIN. As such, let's move onto the second INNER JOIN to the user table. To satisfy this JOIN, we need the userID from the club_membership table. As such, let's add it to the index:

ALTER TABLE `club_membership`
	DROP INDEX `IX_byClub`,
	ADD INDEX `IX_byClub` (`clubID`, `endedAt`, `userID`) USING BTREE
;

Now, with this compound, secondary index in place, if we rerun the EXPLAIN for our query, we get the following output:

mysql> EXPLAIN -- .... (truncated) .... index: (`clubID`, `endedAt`, `userID`).
   -> ;
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+--------------------------+
| id | select_type | table | type   | possible_keys | key       | key_len | ref                   | rows | Extra                    |
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+--------------------------+
|  1 | SIMPLE      | c     | const  | PRIMARY       | PRIMARY   | 4       | const                 |    1 | Using index              |
|  1 | SIMPLE      | cm    | ref    | IX_byClub     | IX_byClub | 10      | const,const           |    7 | Using where; Using index |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY   | 4       | invisionapp.cm.userID |    1 | NULL                     |
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+--------------------------+
3 rows in set (0.00 sec)

We're still scanning 7-records; but, notice that our Extra column has changed from Using index condition to Using where; Using index. What this means is that the MySQL database is now able to pull all needed information out of the index itself. In other words, we've created a covering index for the club_membership table in this particular use-case / query pattern.

A covering index is always a good thing for read-performance. However, it can be bad for write-performance since a more complex index requires more maintenance as more data is written to the table. That said, since this is one of the primary use-cases that this table, I'm more than happy to create covering indexes.

Show Me All Clubs In Which A Given User Is An Active Member

Now that we've mastered the previous use-case, let's move on to identifying the clubs in which a given user has an active membership. To represent this use-case, I'm going to use the following SQL query:

EXPLAIN
SELECT
	c.id,
	c.name
FROM
	club_membership cm
INNER JOIN
	club c
ON
	(
			cm.userID = 24 -- The user (item) in question.
		AND
			cm.endedAt IS NULL -- Active memberships.
		AND
			c.id = cm.clubID -- The clubs (containers) for this user (item).
	)
;

If we run EXPLAIN on this query - leaving in-place the index from our previous use-case - we get the following output:

mysql> EXPLAIN -- .... (truncated) .... no RELEVANT indexes yet.
   -> ;
+----+-------------+-------+------+---------------+-----------+---------+------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key       | key_len | ref                          | rows | Extra                    |
+----+-------------+-------+------+---------------+-----------+---------+------------------------------+------+--------------------------+
|  1 | SIMPLE      | c     | ALL  | PRIMARY       | NULL      | NULL    | NULL                         | 1000 | NULL                     |
|  1 | SIMPLE      | cm    | ref  | IX_byClub     | IX_byClub | 14      | invisionapp.c.id,const,const |    1 | Using where; Using index |
+----+-------------+-------+------+---------------+-----------+---------+------------------------------+------+--------------------------+
2 rows in set (0.00 sec)

This output is a little confusing because the Using where; Using index Extra might lead you to think that we're using a covering index; but, if you look at the club table, you'll see that we're reading in 1,000-records. This is a full-table scan (I put 1,000 records into the club table in order to explore this topic). Essentially, the query-optimizer is reversing our INNER JOIN look-up in order to try and use an existing index; however, since none of the indexes were designed for this use-case, the result is terrible.

So, going back to the "what I have" / "what I need" mentality for INNER JOIN structuring, we can see that the first condition is the userID that we're trying to look-up. So, let's add that to an index:

ALTER TABLE `club_membership`
	ADD INDEX `IX_byUser` (`userID`) USING BTREE
;

And, with this index, if we re-run the EXPLAIN we get the following output:

mysql> EXPLAIN -- .... (truncated) .... index: (`userID`).
   -> ;
+----+-------------+-------+--------+---------------------+-----------+---------+-----------------------+------+-------------+
| id | select_type | table | type   | possible_keys       | key       | key_len | ref                   | rows | Extra       |
+----+-------------+-------+--------+---------------------+-----------+---------+-----------------------+------+-------------+
|  1 | SIMPLE      | cm    | ref    | IX_byClub,IX_byUser | IX_byUser | 4       | const                 |  672 | Using where |
|  1 | SIMPLE      | c     | eq_ref | PRIMARY             | PRIMARY   | 4       | invisionapp.cm.clubID |    1 | NULL        |
+----+-------------+-------+--------+---------------------+-----------+---------+-----------------------+------+-------------+
2 rows in set (0.00 sec)

This has gotten a little bit better, dropping the table scan from 1,000-records down to 672-records. And, you can see the query optimizer is using the index - IX_byUser - that we just added. But, the Extra column shows Using where, which means that we're reading in both indexed and non-indexed data.

To improve upon this, let's add endedAt to the index, which is the next column in the "what I have" / "what I need" conditions:

ALTER TABLE `club_membership`
	DROP INDEX `IX_byUser`,
	ADD INDEX `IX_byUser` (`userID`, `endedAt`) USING BTREE
;

With this index, if we re-run the EXPLAIN we get the following output:

mysql> EXPLAIN -- .... (truncated) .... index: (`userID`, `endedAt`).
   -> ;
+----+-------------+-------+--------+---------------------+-----------+---------+-----------------------+------+-----------------------+
| id | select_type | table | type   | possible_keys       | key       | key_len | ref                   | rows | Extra                 |
+----+-------------+-------+--------+---------------------+-----------+---------+-----------------------+------+-----------------------+
|  1 | SIMPLE      | cm    | ref    | IX_byClub,IX_byUser | IX_byUser | 10      | const,const           |  314 | Using index condition |
|  1 | SIMPLE      | c     | eq_ref | PRIMARY             | PRIMARY   | 4       | invisionapp.cm.clubID |    1 | NULL                  |
+----+-------------+-------+--------+---------------------+-----------+---------+-----------------------+------+-----------------------+
2 rows in set (0.00 sec)

At this point, our index structure has dropped the row-read count from 672-records to 314-records, which represents the active club memberships for the given user. But, our Extra column says Using index condition, which means that the MySQL database is using the index to drive the filtering; but then, needs to read-in the full table row in order to process the full INNER JOIN.

To improve upon this, we can add our final "what I have" / "what I need" condition to the index: clubID.

ALTER TABLE `club_membership`
	DROP INDEX `IX_byUser`,
	ADD INDEX `IX_byUser` (`userID`, `endedAt`, `clubID`) USING BTREE
;

And with this index, if we re-run the EXPLAIN we get the following output:

mysql> EXPLAIN -- .... (truncated) .... index: (`userID`, `endedAt`, `clubID`).
   -> ;
+----+-------------+-------+--------+---------------------+-----------+---------+-----------------------+------+--------------------------+
| id | select_type | table | type   | possible_keys       | key       | key_len | ref                   | rows | Extra                    |
+----+-------------+-------+--------+---------------------+-----------+---------+-----------------------+------+--------------------------+
|  1 | SIMPLE      | cm    | ref    | IX_byClub,IX_byUser | IX_byUser | 10      | const,const           |  315 | Using where; Using index |
|  1 | SIMPLE      | c     | eq_ref | PRIMARY             | PRIMARY   | 4       | invisionapp.cm.clubID |    1 | NULL                     |
+----+-------------+-------+--------+---------------------+-----------+---------+-----------------------+------+--------------------------+
2 rows in set (0.00 sec)

We're still scanning 315-records, which is the number of active club memberships belonging to this user; but, notice that our Extra column has changed from Using index condition to Using where; Using index. This means that the MySQL database is now able to pull all needed information out of the index. Which, again, means that we've created a covering index for the club_membership table in this particular use-case / query pattern.

And again, a covering index - for a primary use case of the table - is awesome!

Check To See If A Given User Is An Active Member Of A Given Club

Our final use-case for the club_membership table is checking to see if a given user has an active membership to a given club. A query like this would normally be used to assert some sort of permission on behalf of the user; or, act as a guard to some sort of membership mutation.

To represent this use-case, I'm going to use the following SQL query:

EXPLAIN
SELECT
	1
FROM
	club_membership cm
WHERE
	cm.clubID = 72 -- The club (container) in question.
AND
	cm.userID = 24 -- The user (item) in question.
AND
	cm.endedAt IS NULL -- Active memberships.
;

Notice that I am pulling back 1 in the SELECT statement. In this case, that's because I don't really care about the data itself - only that the data exists or doesn't exist indicating an active club membership or an ended / non-existent club membership, respectively.

Now, if we run the EXPLAIN on this SQL query, we get the following output:

mysql> EXPLAIN -- .... (truncated) .... only previous indexes.
   -> ;
+----+-------------+-------+------+---------------------+-----------+---------+-------------------+------+--------------------------+
| id | select_type | table | type | possible_keys       | key       | key_len | ref               | rows | Extra                    |
+----+-------------+-------+------+---------------------+-----------+---------+-------------------+------+--------------------------+
|  1 | SIMPLE      | cm    | ref  | IX_byClub,IX_byUser | IX_byClub | 14      | const,const,const |    1 | Using where; Using index |
+----+-------------+-------+------+---------------------+-----------+---------+-------------------+------+--------------------------+
1 row in set (0.00 sec)

Woot woot! As you can see, this particular use-case is already handled by the IX_byClub index, which makes sense since this index has all of the columns that we're looking for - remember what we did above:

ADD INDEX `IX_byClub` (`clubID`, `endedAt`, `userID`) USING BTREE

The difference with this query is that we're not using the userID to power a second INNER JOIN. Instead, we're using it to filter within the first INNER JOIN. And, if you look at the Extra column, we have Using where; Using index which, again, means that we have a covering index. Or, in other words, all of the data that the MySQL database needs to satisfy this query comes directly out of the index - the database doesn't have to read any table rows for this use-case with these indexes.

Writing SQL Is Thrilling

Sometimes, there's nothing quite as satisfying as creating database tables and designing indexes that power your application. Hopefully this exploration has helped illustrate how the structure of an index lines-up with the structure of your SQL queries; and, how you might leverage intentful conditions of your INNER JOIN clauses in order to help identify which columns need to be placed in your database indexes.

Index design can be scary; but, if you remember that indexes are meant to evolve along-side your application and its query-patterns, you can start to let go of some of that stress.

Want to use code from this post? Check out the license.

Reader Comments

238 Comments

I learn so much about EXPLAIN through your articles. These are the finer points of SQL which are easy to ignore, but so useful to bring into the light. Thanks Ben!

15,848 Comments

@Chris,

The EXPLAIN command is awesome; but, I will freely admit that it is generally confusing; and, that I am often confused by it :D The high-level points that I look for are that the tables are:

  1. Using indexes.
  2. Using the indexes that I thought they would use.
  3. Limiting rows in some way.

The Extra column is additional information, but definitely ups-the-level of fuzziness in how I think about the query. Here's the docs for Explain:

https://dev.mysql.com/doc/refman/8.0/en/explain-output.html

... one look at the docs and I'm more confused than I was when I started :D

448 Comments

Ben. Interesting stuff. Just out of interest, why did you not write the 2nd Query like:

SELECT
	u.id,
	u.name
FROM
	club_membership cm
INNER JOIN
	club c
ON
	c.id = cm.clubID
INNER JOIN
	user u
ON
	u.id = cm.userID -- The users (items) within the club (container).
WHERE cm.clubID = 72 -- The club (container) in question.
	AND
	cm.endedAt IS NULL -- Active memberships.
;

I usually, only add ID comparisons to my ON clause?

15,848 Comments

@Charles,

It's a personal preference. I like to do as much of my filtering in the same place that I reference the table. By moving WHERE conditions into my ON clauses, I can see the table and the filtering of that table in the same physical location within the SQL statement.

I am sure that under-the-hood the query planner is doing the same thing in both cases; meaning, I doubt there is any technical difference between the two approaches. It's all just mental.

One thing that I like about having the two "aspects" in the same place is that I think it forces me to think more clearly about limiting the cross-product of the various tables. When I think about a query in two difference phases (ie, "join" and then "filter"), I don't think as clearly about how I want to reduce the number of rows that get scanned.

On the other hand, when I think about each table and the filtering at the same time, it just makes it more clear in my head that I am reducing the cross-product as much as possible.

448 Comments

To be honest, I thought it was only possible to compare the Primary Key & Foreign Key in the ON clause. I didn't realise you could add any kind of comparison?

So, this has been a valuable lesson...

15,848 Comments

@Charles,

Right? SQL is super flexible. It's kind of an amazing language. No wonder it has lasted so long. And, some of the DB systems have all kinds of crazy extensions that I can't even wrap my brain around.

448 Comments

Yes. And MySQL Server is super reliable.

Yesterday I set up a new VPS and decided I couldn't be bothered to do a full MySQL install.

So, I downloaded MySQL 5.5 Server. No workbench. 30MB download. It took about 10 seconds to install and is working perfectly!

At work, we use MSSQL and the install is about 2GB. And that is without MSSQL Management Studio, which is another gigabyte! It is seriously top heavy.

Give me MySQL any day!

15,848 Comments

@Charles,

I used to use MSSQLServer back in the day (when I worked for an MS shop). I remember the management studio being a pretty solid application. Though, at one point, they updated it and it got a loooooot more complicated.

I've heard people rave about PostgreSQL, but I've not used it personally.

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel