A detailed analysis and processing of MySQL alarms

A detailed analysis and processing of MySQL alarms

Recently, a service has an alarm, which has made me unbearable. The alarm information is as follows:

Metric:mysql.innodb_row_lock_waits Tags:port=4306,service=xxxx diff(#1): 996>900

The general meaning is that there is a database monitoring indicator innodb_row_lock_waits that currently exceeds the threshold of 900

But the embarrassing thing is that every time I went to check the environment after an alarm, the information I got was very limited. There was not enough information in the slow log and error log for analysis. After a while, I began to calm down and analyze the cause of the problem.

First of all, the time point of this alarm information seems to be somewhat regular. I compared the alarm time in recent days and found that it is still relatively regular. So what tasks may be triggered at the system level? I looked up and compared the relevant task configurations and found that there is a scheduled task that will be executed once every minute. But here comes the question. If it is executed once every minute, why will there be large differences in processing results at a specific time? Of course, the explanation of this phenomenon is just a beginning.

In fact, it is quite easy to prove this point. Today, I adopted a wait-and-see mode. I opened the general log around the time of the alarm. From the log output, the frequency of operations is relatively limited.

Soon I got regular alarms, so I started to capture relevant general log records. For example, at 11:18, we can use the following model to get relevant logs. First, get a temporary general log file to capture various DML and execution operations.

cat general.log|grep -E "insert|delete|update|select|exec" > general_tmp.log

Let’s take 11:18 as an example. We can compare the time before and after 1 or 2 minutes. The results are as follows:

# less general_tmp.log |grep "11:18"|wc -l

400

# less general_tmp.log |grep "11:17"|wc -l

666

# less general_tmp.log |grep "11:16"|wc -l

15

It was found that the numbers matched around 1 minute after the alarm was raised.

The data volume of this table is more than 2 million, and the table structure is as follows:

CREATE TABLE `task_queue` (
 `AccID` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment ID',
 `TaskStepID` bigint(20) DEFAULT NULL COMMENT 'Task step ID task_step_conf',
 `QOrder` int(11) DEFAULT NULL COMMENT 'Queue sorting task_step_confi.Step_ID',
 `QState` tinyint(4) DEFAULT '1' COMMENT 'Queue status 1: Waiting to be executed 2: Executing 3: Successful execution 4: Failed to execute',
 `QExcCount` int(11) DEFAULT '1' COMMENT 'Number of executions',
 `CrtTime` datetime DEFAULT NULL COMMENT 'Creation time',
 `ModTime` datetime DEFAULT NULL COMMENT 'Modification time',
 PRIMARY KEY (`AccID`),
 KEY `idx_taskstepid` (`TaskStepID`),
 KEY `idx_qstate` (`QState`)
) ENGINE=InnoDB AUTO_INCREMENT=3398341 DEFAULT CHARSET=utf8

Based on the analysis and comparison in the logs, we can basically identify the SQL as being in an Update operation. The SQL execution plan is as follows:

>>explain update task_queue set QState=1,QExcCount=QExcCount+1,modtime=now() where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: UPDATE
  table: task_queue
 partitions: NULL
   type: index_merge
possible_keys: idx_taskstepid,idx_qstate
   key: idx_qstate,idx_taskstepid
  key_len: 2,9
   ref: NULL
   rows: 11
  filtered: 100.00
  Extra: Using intersect(idx_qstate,idx_taskstepid); Using where; Using temporary

In this execution result, key_len is 2,9, which is different from the previous ken_len calculation rule. The Extra column has given a clear hint that this is an intersect process. What’s special is that it is a process based on the secondary index level. There is a related parameter index_merge_intersection at the optimizer level.

We know that in MySQL, the primary key is a first-class citizen, and the secondary index will eventually be mapped to the primary key level for processing. The index-level intersect is actually a bit like our left and right hands. The left hand corresponds to some data results mapped to a batch of primary key ids, and the right hand corresponds to some data results mapped to another batch of primary key ids. The primary key id values ​​of the two are intersect-calculated. So in the current scenario, is index-level intersect a good idea?

Here I have imagined three comparative scenarios for analysis. First, this is an update statement. In order to ensure the repeatability of subsequent tests, we can convert it into a select statement.

select * from task_queue where QState=0 and taskstepid =411;

Therefore, our comparative test is based on query statements for comparison and analysis.

Scenario 1: The optimizer keeps the default index_merge_intersection enabled and extracts execution details based on the profile

>explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: SIMPLE
  table: task_queue
 partitions: NULL
   type: index_merge
possible_keys: idx_qstate,idx_taskstepid
   key: idx_qstate,idx_taskstepid
  key_len: 2,9
   ref: NULL
   rows: 11
  filtered: 100.00
  Extra: Using intersect(idx_qstate,idx_taskstepid); Using where
1 row in set, 1 warning (0.00 sec)

The profile information is:

Scenario 2: The optimizer turns off index_merge_intersection and compares based on profiles

>set session optimizer_switch='index_merge_intersection=off';


>explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: SIMPLE
  table: task_queue
 partitions: NULL
   type: ref
possible_keys: idx_qstate,idx_taskstepid
   key: idx_qstate
  key_len: 2
   ref: const
   rows: 1451
  filtered: 0.82
  Extra: Using where
1 row in set, 1 warning (0.00 sec)

The profile information is:

Scenario 3: Reconstruct the index and perform comparative analysis

According to the business logic, if a composite index is created, the size of the result set can be greatly reduced, while the idx_qstat index is still retained, so that some businesses can still be used normally.

>alter table task_queue drop key idx_taskstepid;
>alter table task_queue add key `idx_taskstepid` (`TaskStepID`,QState);
explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
      id: 1
 select_type: SIMPLE
    table: task_queue
  partitions: NULL
     type: ref
possible_keys: idx_qstate,idx_taskstepid
     key: idx_taskstepid
   key_len: 11
     ref: const,const
     rows: 1
   filtered: 100.00
    Extra: NULL
1 row in set, 1 warning (0.00 sec)

The profile information is:

It can be clearly seen that the "Sending data" part has been reduced by two orders of magnitude through index reconstruction.

So the next thing is to conduct further analysis and verification, with reason and evidence, and the waiting process is no longer hesitant. A day has passed and no more alarms have been received, which once again shows that we should not underestimate these alarms at work.

Summarize

This is the end of this article about MySQL alarm analysis and processing. For more relevant MySQL alarm processing content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • How to remove the alarm sound when connecting to mysql

<<:  A comprehensive understanding of Vue.js functional components

>>:  Solution to Docker's failure to release ports

Recommend

Full HTML of the upload form with image preview

The upload form with image preview function, the ...

Click the toggle button in Vue to enable the button and then disable it

The implementation method is divided into three s...

A brief discussion on the implementation principle of Webpack4 plugins

Table of contents Preface know Practice makes per...

js object-oriented method to achieve drag effect

This article shares the specific code for impleme...

How to solve the high concurrency problem in MySQL database

Preface We all know that startups initially use m...

Docker adds a bridge and sets the IP address range

I don't know if it's because the binary d...

React entry-level detailed notes

Table of contents 1. Basic understanding of React...

Installation and daemon configuration of Redis on Windows and Linux

# Installation daemon configuration for Redis on ...

HTML blockquote tag usage and beautification

Blockquote Definition and Usage The <blockquot...

Docker deployment and installation steps for Jenkins

First, we need a server with Docker installed. (I...

Detailed steps for deploying Tomcat server based on IDEA

Table of contents Introduction Step 1 Step 2: Cre...