A case study to thoroughly understand how to correctly use MySQL inndb joint index

A case study to thoroughly understand how to correctly use MySQL inndb joint index

There is a business that queries the 5 most recently reviewed data.

SELECT `id`, `title`
FROM `th_content`
WHERE `audit_time` < 1541984478
 AND `status` = 'ONLINE'
ORDER BY `audit_time` DESC, `id` DESC
LIMIT 5;

Checking the monitoring status at that time, the CPU usage was over 100%. show processlist showed that many similar queries were in create sort index state.

View the structure of the table

CREATE TABLE `th_content` (
 `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
 `title` varchar(500) CHARACTER SET utf8 NOT NULL DEFAULT '' COMMENT 'Content title',
 `content` mediumtext CHARACTER SET utf8 NOT NULL COMMENT 'Main content',
 `audit_time` int(11) unsigned NOT NULL DEFAULT '0' COMMENT 'Audit time',
 `last_edit_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Last edit time',
 `status` enum('CREATED','CHECKING','IGNORED','ONLINE','OFFLINE') CHARACTER SET utf8 NOT NULL DEFAULT 'CREATED' COMMENT 'Information status',
 PRIMARY KEY (`id`),
 KEY `idx_at_let` (`audit_time`,`last_edit_time`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

The index has a joint index with audit_time on the left, and no index on status .

Analyze the logic of the above sql execution:

  • Find all primary key IDs that are less than the review time from the joint index (if 1 million data have been reviewed before this timestamp, the primary key IDs of the corresponding 1 million data will be retrieved from the joint index)
  • It would be nice if there is an optimization in the future. Currently, there is: sort the 100 primary key IDs, and then in the next step of the table return operation, the primary keys that are close together may be retrieved in one disk I/O.
  • Go back to the table one by one, find out 1 million rows, and filter out the rows with status='ONLINE'
  • Finally, sort the query results (if 500,000 rows are all ONLINE, continue to sort these 500,000 rows)

Finally, because the amount of data is large, although only 5 rows are taken, according to the extreme example we just gave, 1 million rows of data are actually queried, and finally a memory sort of 500,000 rows of the database is performed in memory.

So it is very inefficient.

A schematic diagram is drawn to illustrate the query process in the first step. The pink part represents the data rows that need to be queried in the table in the end.

In the picture, I forged and filled in some data according to the index storage rules. If there is anything wrong, please leave a message to point it out. I hope that through this picture, you can see the way of joint index storage and index query.

Improvement idea 1

Range searches are not easy to use with good indexes. What improvements will there be if we add a joint index of audit_time and status ?

ALTER TABLE `th_content` ADD INDEX `idx_audit_status` (`audit_time`, `status`);
mysql> explain select `id`, `title` from `th_content` where `audit_time` < 1541984478 and `status` = 'ONLINE' order by `audit_time` desc, `id` desc limit 5;
+----+-------------+------------+-------+------------------------------------------+------------------+--------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+------------------------------------------+------------------+--------+------+--------+-------------+
| 1 | SIMPLE | th_content | range | idx_at_ft_pt_let,idx_audit_status | idx_audit_status | 4 | NULL | 209754 | Using where |
+----+-------------+------------+-------+------------------------------------------+------------------+--------+------+--------+-------------+

Details: Because audit_time is a range search, the index of the second column is not used. Only audit_time can be used, so key_len is 4. In the following idea 2, key_len these two fields is still 5.

Let's analyze the execution process after adding the index:

  • Find the joint index of the row with the largest audit_time that is less than the audit time from the joint index
  • Then search downwards one by one, because < audit_time is a range search and the values ​​of the second column index are scattered. Therefore, you need to search forward one by one to match the index rows that meet the condition ( status = 'ONLINE') until you get the 5th row.
  • Return to the table to query the specific data required

In the diagram above, the pink color indicates the rows that meet the index requirements of the first column. If we query forward one by one, we will find three records on this leaf node. Then we need to continue to query to the left at the previous leaf node. Until 5 rows that meet the records are found, and finally return to the table.

Improvements

Because status value is in the index, there is no need to return to the table for query when filtering rows that satisfy status = 'ONLINE'. When returning to the table, only 5 rows of data are queried, which greatly reduces iops .

Disadvantages of this index

If status all five rows scanned in idx_audit_status is ONLINE , then only five rows need to be scanned;

If only 4 rows in the first 1 million rows scanned in idx_audit_status have status of ONLINE , you need to scan 1 million and 1 row to get the required 5 rows. The index needs to scan an undetermined number of rows.

Improvement idea 2

ALTER TABLE `th_content` DROP INDEX `idx_audit_status`;
ALTER TABLE `th_content` ADD INDEX `idx_status_audit` (`status`, `audit_time`); 

This way, there is no pressure whether it is sorting or returning to the table.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM. If you want to learn more about this, please check out the following links

You may also be interested in:
  • Implementation of MySQL joint index (composite index)
  • Which is faster among MySQL full-text index, joint index, like query, and json query?
  • In-depth analysis of the leftmost matching principle of MySQL joint index
  • Analysis of MySQL joint index function and usage examples
  • MySQL joint index usage examples
  • Joint Index Learning Tutorial in MySQL
  • Optimizing MySQL joint index and Where clause to improve database operation efficiency
  • MySQL independent index and joint index selection

<<:  Graphic tutorial on installing CentOS7 on VMware 15.5

>>:  Detailed explanation of JavaScript stack and copy

Recommend

Manually implement js SMS verification code input box

Preface This article records a common SMS verific...

Detailed explanation of the steps to build a Vue project with Vue-cli

First you need to install Vue-cli: npm install -g...

Detailed analysis of matching rules when Nginx processes requests

When nginx receives a request, it will first matc...

MySQL Basics Quick Start Knowledge Summary (with Mind Map)

Table of contents Preface 1. Basic knowledge of d...

Introduction to the three essential logs for MySQL database interviews

Table of contents 1. redo log (transaction log of...

An article to understand the execution process of MySQL query statements

Preface We need to retrieve certain data that mee...

Web designer is a suitable talent

<br />There is no road in the world. When mo...

Linux kernel device driver proc file system notes

/***************** * proc file system************...

Using CSS to implement image frame animation and curve motion

The basic principle of all animations is to displ...

Textarea tag in HTML

<textarea></textarea> is used to crea...

Conditional comment style writing method and sample code

As front-end engineers, IE must be familiar to us...