Looking For Database Performance Bottlenecks And Optimizations Using The Sys Schema In MySQL 5.7
Over the last few weeks, I've been looking into feature enhancements that come with our upgrade to MySQL 5.7 at InVision. The two marquee features that really jumped out at me were the JSON
column type support and the virtual indexed columns. However, one more feature that I wanted to explore today is the sys
schema. The sys
schema is a set of Views (and other database objects) that provide more "human accessible" insights into the data collected by the MySQL Performance Schema. These insights might help me find ways to optimize SQL queries and apply better indexes to my application database tables.
sys
Schema
Two Sets of Views in the The sys
schema comes with two sets of Views: those that have the x$
prefix, and those that do not. The x$
-prefixed tables contain "unformatted" data in terms of latencies (times are reported in picoseconds) and raw SQL. The tables without the x$
-prefix use human-readable latencies, like "1.4 m", and provide SQL query "hints" (that are basically unreadable).
I am founding that the x$
-prefixed tables are the most helpful since I want to see the raw SQL; and, it's easier to ORDER BY
picoseconds. Trying to ORDER BY
human-readable latencies provides undesirable outcomes as it will order the results based on String-values, not numbers. That said, I can still provide human-friendly latencies in the SELECT
columns by using the sys
Function, format_time()
.
sys
Schema Data is Based on Runtime Performance
This probably goes without saying; but, the data provided in the sys
schema is based on runtime performance. Which means, you have to be looking at it on your production database in order for it to provide the most value. Attempting to look at the sys
schema on a non-production database (such as a staging database or a read-replica) will only give you insights about that non-production database. Attempting to port that findings onto a different database will be misguided.
ASIDE: The view that reports "redundant indexes" is probably portable, assuming your database schema is the same everywhere.
sys
Schema Views of Interest
After looking through the sys
schema, I've collected a few SQL queries that feel like they'll be of the most interest for my level of database mastery. Historically, my tools of choice for database performance insights have been the processlist
- which shows which SQL queries are running right at this moment; and, the slow-query log - which shows which SQL queries have taken the longest to execute. I've never been a developer that understands much about database IOPs, threads, or locks. As such, I didn't really get much value out of the sys
Views that speak to those concerns.
For the following set of SQL queries, assume that something to this effect was being run before each of the queries:
SET @db = 'my_app_database';
The sys
schema provides insights across all schemas in a given MySQL instance. As such, I had to limit each query to a particular application schema in order to keep the records relevant. Strangely enough, each SQL query uses a different "name" for this constraint: table_schema
, object_schema
, and db
.
schema_redundant_indexes
View: When it comes to database performance, nothing is as magical as database indexes. Indexes provide a way for us to quickly access records based on row properties. This performance, while critical, comes at a cost, both from a storage standpoint and a processing standpoint. Every time a row is update, the relevant indexes need to be updated and persisted to disk. As such, any unnecessary indexing mean that you're placing unnecessary load on the database.
The schema_redundant_indexes
view helps us locate indexes that are unnecessary since there are other "dominant" indexes that already contain the same column information. What this means it that the database is eating the cost of maintaining those indexes for no reason.
/**
* Find unnecessary indexes. Having to maintain an index is expensive from both a latency
* and a storage standpoint. If we can get rid of any redundant indexes, then we can save
* space and improve performance by dropping them.
*/
SELECT
-- The unnecessarily indexed columns.
t.table_name,
t.redundant_index_name,
t.redundant_index_columns,
-- The index that is already handling the aforementioned columns.
t.dominant_index_name,
-- The SQL statement you can execute in order to drop the unnecessary index.
t.sql_drop_index
FROM
sys.schema_redundant_indexes t
WHERE
t.table_schema = @db
;
The clever thing about this view is that it provides a sql_drop_index
column, which is an actual ALTER
statement that shows you how to get rid of the unnecessary index.
schema_unused_indexes
View: Dovetailing with the previous index, the schema_unused_indexes
view shows us which indexes are not being used. Though, again, I have to stress the fact that this is based on production traffic. Which means, in order for this view to be helpful, your database has to have been live for long enough, and receive enough traffic, such that the access patterns are representative of the full-breadth of SQL queries within your web application.
As I stated above, every index comes at a cost in terms of performance and storage space. As such, if we can remove indexes that don't appear to be participating in any SQL queries, we can reduce the load on the database.
/**
* Find unused indexes. Having to maintain an index is expensive from both a latency and a
* storage standpoint. If we have indexes that we never use, then we can save space and
* improve performance by dropping them.
*
* CAUTION: Unlike "redundant indexes", unused indexes are not deterministic. Meaning,
* this set of indexes is based on access patterns of the application. If your application
* database has not been running for a sufficient amount of time, then this list may not
* be indicative of the full breadth of index utilization.
*/
SELECT
t.object_name,
t.index_name
FROM
sys.schema_unused_indexes t
WHERE
t.object_schema = @db
;
Based on what I'm seeing in this view, it becomes obvious that index-usage is a secondary indicator of unused tables! Meaning, if a table doesn't get accessed anymore within your application, none of its indexes will be used. As such, this view, while not directly related to table access, can give you insight into which tables might be droppable.
x$schema_index_statistics
View: As I mentioned before, every database index comes with a cost of maintenance. The x$schema_index_statistics
view can help shed light on that cost by showing us the total latency (time) that has gone into updating an index whenever a row is inserted, updated, and deleted.
But, this view also shows us how much time is spent reading from a given index. This can show us which indexes are "hot"; which, in turn, helps us understand which queries are hot.
/**
* Look at access and update performance for indexes.
*/
SELECT
t.table_name,
t.index_name,
t.rows_selected,
-- Latency in the "x$" tables are reported as picoseconds. This built-in function
-- formats the raw time into human-friendly values, like "3.5 s" and "56.07 ms".
sys.format_time( t.select_latency ) AS select_latency,
t.rows_inserted,
sys.format_time( t.insert_latency ) AS insert_latency,
t.rows_updated,
sys.format_time( t.update_latency ) AS update_latency,
t.rows_deleted,
sys.format_time( t.delete_latency ) AS delete_latency
FROM
sys.x$schema_index_statistics t
WHERE
t.table_schema = @db
ORDER BY
t.select_latency DESC
;
This data in this view was actually quite surprising for me. Granted, I'm only looking at a staging database (ie, not production); but, the index with the highest total select
latency was not at all what I would expect. And, probably indicates that a table is being accessed with unnecessary volume.
x$statements_with_full_table_scans
View: The thing that database indexes provide - the magic that they bring to the table (pun intended!) - is that they prevent full table scans. That is, they allow us to locate a subset of records without having to read every record into memory first. As you can imagine, a full table scan can be very expensive, depending on the size of the table in question.
The x$statements_with_full_table_scans
view allows us to see which of our SQL queries are performing full table scans. Which means, there's likely some opportunity for improvement, whether it be a new database index or some in-app caching.
/**
* Find SQL queries that perform full-table scans. These are queries in which all the rows
* in the table need to be read-in during query execution.
*
* NOTE: For us, this mostly includes queries that involve "look up" tables, like lists of
* permission-types or status-types.
*/
SELECT
t.query,
t.exec_count,
-- Latency in the "x$" tables are reported as picoseconds. This built-in function
-- formats the raw time into human-friendly values, like "3.5 s" and "56.07 ms".
sys.format_time( t.total_latency ) AS total_latency,
t.no_index_used_count,
t.no_index_used_pct,
t.rows_sent,
t.rows_examined,
t.rows_sent_avg,
t.rows_examined_avg
FROM
sys.x$statements_with_full_table_scans t
WHERE
t.db = @db
ORDER BY
t.no_index_used_pct DESC,
t.no_index_used_count DESC
;
According to books and documentation that I've read, the MySQL query optimizer will sometimes choose to perform a full table scan even if an index is available. It will do this if the cost of reading from the table is less expensive than the cost of accessing the index. That said, in my experience, this is almost never true, even in a local development environment with a few records. Your mileage may vary.
That said, this view provides a no_index_used_pct
column, which is the percentage of time the given SQL statements performs a full table scan. What you'll see is that some queries always perform a full table scan whereas other queries only perform a full table scan in rare occasions.
For me, a lot of the SQL queries that show up in this view are those that perform some sort of JOIN
to a "look up" table. For example, a number of the queries LEFT OUTER JOIN
to company_permission_type
, which is a static list of key-value pairs. This table can probably just get cached in memory, moving the "join" into the application code.
x$schema_table_statistics
View: The x$schema_table_statistics
view provides general CRUD (Create, Read, Update, Delete) insights around table access. This includes both the number of records affected as well as the total time it takes to perform the various CRUD operations.
This view can help us locate "hot" tables. And, tables that may benefit from new indexes. It may also give us insight into which tables should be destructured. For example, if a table has a high update
latency, it may mean that the updated data is causing too much row contention and some of the columns may need to be "extracted".
/**
* Look at general latency statistics (how much overall time it takes) to perform various
* CRUD operations (Create, Read, Update, Delete) on your database tables.
*/
SELECT
t.table_name,
-- Latency in the "x$" tables are reported as picoseconds. This built-in function
-- formats the raw time into human-friendly values, like "3.5 s" and "56.07 ms".
sys.format_time( t.total_latency ) AS total_latency,
-- Number of rows read and total latency for read operations.
t.rows_fetched,
sys.format_time( t.fetch_latency ) AS fetch_latency,
-- Number of rows inserted and total latency for insert operations.
t.rows_inserted,
sys.format_time( t.insert_latency ) AS insert_latency,
-- Number of rows updated and total latency for update operations.
t.rows_updated,
sys.format_time( t.update_latency ) AS update_latency,
-- Number of rows deleted and total latency for delete operations.
t.rows_deleted,
sys.format_time( t.delete_latency ) AS delete_latency
FROM
sys.x$schema_table_statistics t
WHERE
t.table_schema = @db
ORDER BY
t.total_latency DESC
;
x$statement_analysis
View: Where as the previous view showed us table-level statistics, the x$statement_analysis
view gets into the low-level details about specific SQL queries, including how often they are executed, how long they take, and whether or not they required temp tables (which have a non-trivial performance cost).
/**
* Find SQL queries that have the most latency (how much overall time it takes).
*/
SELECT
-- The raw SQL statement being executed.
t.query,
t.exec_count,
-- Latency in the "x$" tables are reported as picoseconds. This built-in function
-- formats the raw time into human-friendly values, like "3.5 s" and "56.07 ms".
sys.format_time( t.total_latency ) AS total_latency,
sys.format_time( t.max_latency ) AS max_latency,
sys.format_time( t.avg_latency ) AS avg_latency,
sys.format_time( t.lock_latency ) AS lock_latency,
t.rows_sent,
t.rows_sent_avg,
t.rows_examined,
-- The total number of internal in-memory temporary tables created by occurrences of
-- this query.
t.tmp_tables,
-- The total number of internal on-disk temporary tables created by occurrences of
-- this query.
t.tmp_disk_tables,
t.rows_sorted
FROM
sys.x$statement_analysis t
WHERE
t.db = @db
ORDER BY
t.total_latency DESC
;
Here's a shocker for me - in our staging database (where I am doing this exploration), this is the SQL query with the highest average latency:
SELECT * FROM user LIMIT 1
Given the fact that this has no WHERE
clause, I can only assume that someone has chosen to include this SQL query in some sort of a health-check. But, here's the question: why the SELECT *
? This is reading-in the whole table row, which seems unnecessary for a health check. It's relatively slow, especially when you consider that something like this:
SELECT 1 FROM user LIMIT 1
... would use a covering index and be much faster.
Of course, if this is for a health check, then why not just use:
SELECT 1
This is a valid query and should validate the database connection just as well.
I can already see that this sys
view is going to be super helpful. And, likely lead to a lot of cross-team communication.
x$statements_with_sorting
View: When query for rows, the rows come out of the database in an order that is based on the structure of underlying indexes. As such, you often need to add an ORDER BY
clause to your SQL query. But, ordering records in a way that is different from the underlying indexes has a performance cost. The x$statements_with_sorting
view can shed light on that cost.
/**
* Find SQL queries that perform sorts. Sorting in the DB can have a negative affect on
* performance. Perhaps these queries can push sorting down into the application layer.
*/
SELECT
t.query,
t.exec_count,
-- Latency in the "x$" tables are reported as picoseconds. This built-in function
-- formats the raw time into human-friendly values, like "3.5 s" and "56.07 ms".
sys.format_time( t.total_latency ) AS total_latency,
t.sorts_using_scans,
t.sort_using_range,
t.rows_sorted,
t.avg_rows_sorted
FROM
sys.x$statements_with_sorting t
WHERE
t.db = @db
ORDER BY
t.total_latency DESC
;
The thing to understand with an ORDER BY
is that you likely have one database server and many more application servers. As such, a super easy way to increase performance of the queries in this view is to move the ORDER BY
clauses into the application layer. In Lucee CFML, this could not be easier. The query returned from the CFQuery
tag has a .sort()
method. Which means that a SQL statement like this:
ORDER BY id ASC, name ASC, email ASC
... can effortlessly be replaced with a ColdFusion statement like this:
.sort( "id, name, email", "asc, asc, asc" )
And, just like that, you've load balanced your ORDER BY
cost across all of your ColdFusion application servers.
NOTE: I am not advocating that all
ORDER BY
clauses be moved into the application layer. I'm only saying that if you see some "hot" SQL queries that have a high sort latencies, moving theORDER BY
into the application layer is a low-hanging fruit that might squeeze some better performance out of those SQL queries specifically.
x$statements_with_temp_tables
View: When MySQL needs to create an intermediary result-set as part of its processing, it uses temp tables. Sometimes these temp tables are stored in memory; sometimes they are persisted to disk. To be honest, I can never remember all of the rules around when temp tables get created; but, I believe it's usually related to things like ORDER BY
, UNION
, and LIMIT
.
And, I know they have some cost associated with them. The x$statements_with_temp_tables
view can help us see which SQL queries create the most temp tables. Which, in turn, may give us an indication of which queries need optimization.
/**
* Find SQL queries that create in-memory or on-disk TEMP tables in order to calculate the
* results. Temp tables can put a lot of load on the database.
*/
SELECT
t.query,
t.exec_count,
-- Latency in the "x$" tables are reported as picoseconds. This built-in function
-- formats the raw time into human-friendly values, like "3.5 s" and "56.07 ms".
sys.format_time( t.total_latency ) AS total_latency,
-- The total number of internal in-memory temporary tables created by occurrences of
-- this query.
t.memory_tmp_tables,
-- The total number of internal on-disk temporary tables created by occurrences of
-- this query.
t.disk_tmp_tables,
t.avg_tmp_tables_per_query,
t.tmp_tables_to_disk_pct
FROM
sys.x$statements_with_temp_tables t
WHERE
t.db = @db
ORDER BY
t.total_latency DESC
;
x$statements_with_runtimes_in_95th_percentile
View: The x$statements_with_runtimes_in_95th_percentile
view shows us the SQL queries where 95% of all the executions fall under the given latencies.
/**
* Find SQL queries with the highest latency in the 95th percentile. This means, that 95%
* of the query instances fall under the given latencies.
*/
SELECT
t.query,
t.exec_count,
-- Latency in the "x$" tables are reported as picoseconds. This built-in function
-- formats the raw time into human-friendly values, like "3.5 s" and "56.07 ms".
sys.format_time( t.total_latency ) AS total_latency,
sys.format_time( t.max_latency ) AS max_latency,
sys.format_time( t.avg_latency ) AS avg_latency,
t.rows_sent,
t.rows_sent_avg,
t.rows_examined,
t.rows_examined_avg
FROM
sys.x$statements_with_runtimes_in_95th_percentile t
WHERE
t.db = @db
ORDER BY
t.avg_latency DESC
;
I am not great at statistics. I am not sure how this view will dovetail with the x$statement_analysis
view, which also gives us SQL query latency metrics. I supposed the P95 speaks more to how the vast majority of executions perform for a given SQL statement? I'll have to get more familiar with this one.
schema_auto_increment_columns
View: The last view that I think will be interesting is the schema_auto_increment_columns
view. This one shows us how much of our primary key space we consumed across all of our tables. In other words, it shows us how long until any given table goes Boom (when it runs out of auto-incrementing IDs).
/**
* Find the amount of auto-increment "space" has been used. This may can help identify
* tables that are running out of available ID values.
*/
SELECT
t.table_name,
t.column_name,
-- The highest possible ID that can be created with this data-type.
t.max_value,
-- The last ID created in this table.
t.auto_increment,
-- The amount of "ID space" that has been used-up. When this hits 100%, things go
-- "Boom"!
CONCAT( ( t.auto_increment_ratio * 100 ), '%' )
FROM
sys.schema_auto_increment_columns t
WHERE
t.table_schema = @db
;
Another interesting aspect of this view is that it shows us which tables used signed vs unsigned INT
values. Spoiler alert: they should all be unsigned. Use of any signed INT
values was a mistake; and, cuts-down significantly on the number of IDs that are available... until Boom!
This is Really Exciting!
As I mentioned at the onset, I've historically used the processlist
and the slow-query log to find poorly performing SQL queries. But, this sys
schema is really just gonna blow those techniques out of the water! Even running this on our staging database, I can already spot some issue - I'm feeling confident that when we let this schema loose on production, all kinds of interesting insights will come to the surface.
Epilogue: Lots of Slow ORM Queries
I'm a fan of writing SQL. Few things bring me more joy (and a sense of control over performance). Which is why I tend to shy away from ORMs: Object-Relational Mappers. ORMs provide a layer of abstraction over the database structures, which - in my opinion - makes it easier to write problematic SQL queries. And, to that end, I will say that I did see a lot of what looked like ORM-based SQL queries showing up with high latencies.
ORMs queries are usually easy to spot because they add a lot of "alias cruft" to the SQL statement, like this:
SELECT
MyTable.id,
MyTable.name
FROM
my_table AS MyTable
;
In this case, the MyTable
alias probably relates back to the "model name" used in the ORM definition.
I'm not trying to hate on ORMs here - I know people love them. It was just something that popped-out at me when looking through the sys
views. That said, a frustrating side-effect of using an ORM is that it makes the SQL queries harder to locate in the code because they are more generic. At least with a hand-written SQL statement, there are usually "hints" and "landmarks" that I can use in order to search-for the given query.
Want to use code from this post? Check out the license.
Reader Comments
Very well written article, I was looking for some guidance when I started using the sys schema tables for mysql query performance and identification of bottle necks, lucky to have found this one, at the very beginning of my search
@Kameswararao,
Very cool! I'm glad you found this helpful. May it help you find performance bottlenecks!
Hi Ben,
tell me dude, there are some ways to join db column in processlist table with other view to cach the expecific user that did some query?
@Marco,
Yes, while the
SHOW FULL PROCESSLIST
doesn't allow you toJOIN
to it, there is another way to access the process list as a table using the information schema:Once you have it in this format, you can
JOIN
to any other table as you would normally. I have some examples of this to group by user's IP address here:www.bennadel.com/blog/3054-grouping-the-mysql-processlist-by-ip-address-to-view-connection-counts.htm
I hope that helps!
@Ben,
Hunn man....thanks Ben for answering my question :)
The point is, in comparison to other SGBD, sometimes I need to make some reports to cross the information with:
Database vs user ( schema ) that did some bad script, without index or something like these issues, because I have more than 5 projects in production using different users. Then it's easier to show the dev team the expecific project in your code.
And in your post you teach us to make some interesting reports but in my case ( again :) ) the dev team needs the user in those reports.. :)
@Marco,
I understand - I've been a big fan of the "slow log" to find some bad queries that teams need to address. Good luck! It's always an ongoing battle.
How old is the data in tables like: sys.x$schema_table_statistics
Is it since the last reboot?
Is there anywhere outside of a reboot to clear these tables to get fresh/current data?
Thanks!
@Colin,
That's a great question, I actually have no idea. Sadly, where I work, I was never granted access to our production performance schema, so I haven't really had much chance to make this stuff more second nature 😞