Reasons why MySQL 8.0 statistics are inaccurate

Reasons why MySQL 8.0 statistics are inaccurate

Preface

Whether it is Oracle or MySQL, the new features introduced in the new version, on the one hand, bring improvements to the product's functionality, performance, user experience, etc., but on the other hand, may also bring some problems, such as code bugs, problems caused by incorrect customer usage, etc.

Case Study

MySQL 5.7 scenario

(1) First, create two tables and insert data

mysql> select version();
+------------+
| version() |
+------------+
| 5.7.30-log |
+------------+
1 row in set (0.00 sec)

mysql> show create table test\G
*************************** 1. row ***************************
    Table: test
Create Table: CREATE TABLE `test` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `k` int(10) unsigned NOT NULL DEFAULT '0',
 `c` char(120) NOT NULL DEFAULT '',
 `pad` char(60) NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=101 DEFAULT CHARSET=utf8mb4 MAX_ROWS=1000000
1 row in set (0.00 sec)

mysql> show create table sbtest1\G
*************************** 1. row ***************************
    Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `k` int(10) unsigned NOT NULL DEFAULT '0',
 `c` char(120) NOT NULL DEFAULT '',
 `pad` char(60) NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8mb4 MAX_ROWS=1000000
1 row in set (0.00 sec)

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 100 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.14 sec)

(2) Check the statistical information of the two tables, both are relatively accurate

mysql> select table_schema,table_name,table_rows from tables where table_name='test';
+--------------+------------+------------+
| table_schema | table_name | table_rows |
+--------------+------------+------------+
| test | test | 100 |
+--------------+------------+------------+
1 row in set (0.00 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='sbtest1';
+--------------+------------+------------+
| table_schema | table_name | table_rows |
+--------------+------------+------------+
| test | sbtest1 | 947263 |
+--------------+------------+------------+
1 row in set (0.00 sec)

(3) We continue to insert 10 million records into the test table and check the statistics again. The statistics are still relatively accurate because by default, if the data change exceeds 10%, the statistics will be updated.

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 10000100 |
+----------+
1 row in set (1.50 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='test';
+--------------+------------+------------+
| table_schema | table_name | table_rows |
+--------------+------------+------------+
| test | test | 9749036 |
+--------------+------------+------------+
1 row in set (0.00 sec)

MySQL 8.0 scenario

(1) Next, let’s look at the situation under 8.0. Similarly, we create two tables and insert the same records.

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.20 |
+-----------+
1 row in set (0.00 sec)

mysql> show create table test\G
*************************** 1. row ***************************
    Table: test
Create Table: CREATE TABLE `test` (
 `id` int unsigned NOT NULL AUTO_INCREMENT,
 `k` int unsigned NOT NULL DEFAULT '0',
 `c` char(120) NOT NULL DEFAULT '',
 `pad` char(60) NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=101 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci MAX_ROWS=1000000
1 row in set (0.00 sec)

mysql> show create table sbtest1\G
*************************** 1. row ***************************
    Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
 `id` int unsigned NOT NULL AUTO_INCREMENT,
 `k` int unsigned NOT NULL DEFAULT '0',
 `c` char(120) NOT NULL DEFAULT '',
 `pad` char(60) NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci MAX_ROWS=1000000
1 row in set (0.00 sec)

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 100 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.02 sec)

(2) Check the statistical information of the two tables, both are relatively accurate

mysql> select table_schema,table_name,table_rows from tables where table_name='test';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | test | 100 |
+--------------+------------+------------+
1 row in set (0.00 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='sbtest1';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | sbtest1 | 947468 |
+--------------+------------+------------+
1 row in set (0.01 sec)

(3) Similarly, we continue to insert 10 million records into the test table and check the statistics again. We find that table_rows still shows 100 records, which is a large deviation.

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 10000100 |
+----------+
1 row in set (0.33 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='test';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | test | 100 |
+--------------+------------+------------+
1 row in set (0.00 sec)

Cause Analysis

So what causes inaccurate statistics? In fact, MySQL 8.0 caches the statistical information in the view tables and statistics in order to improve the query efficiency of information_schema. The cache expiration time is determined by the parameter information_schema_stats_expiry, which defaults to 86400s. If you want to obtain the latest statistical information, you can use the following two methods:

(1) analyze table

(2) Set information_schema_stats_expiry=0

Keep exploring

So what are the consequences of inaccurate statistical information? Will it affect the execution plan? Next, we test it again

Test 1: The number of records in table test is 100, and the number of records in table sbtest1 is 1 million.

Execute the following SQL and check the execution plan. It uses NLJ. The small table test is used as the driving table (full table scan), and the large table sbtest1 is used as the driven table (primary key association). The execution efficiency is very fast.

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 100 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.02 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='test';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | test | 100 |
+--------------+------------+------------+
1 row in set (0.00 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='sbtest1';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | sbtest1 | 947468 |
+--------------+------------+------------+
1 row in set (0.01 sec)

mysql> select t.* from test t inner join sbtest1 t1 on t.id=t1.id where tc='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977' and t1.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977';
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 1 | 501885 | 08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977 | 63188288836-92351140030-06390587585-66802097351-49282961843 |
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> explain select t.* from test t inner join sbtest1 t1 on t.id=t1.id where tc='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977' and t1.c='08566691963-88624912351-16662227201-4664
+----+-------------+-------+------------+--------+---------------+--------+---------+-----------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+--------+---------+-----------+----------+----------+-------------+
| 1 | SIMPLE | t | NULL | ALL | PRIMARY | NULL | NULL | NULL | 100 | 10.00 | Using where |
| 1 | SIMPLE | t1 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.t.id | 1 | 10.00 | Using where |
+----+-------------+-------+------------+--------+---------------+--------+---------+-----------+----------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

Test 2: Table test has about 10 million records, and table sbtest1 has 1 million records

Execute SQL again and check the execution plan. It also follows NLJ. The small table sbtest1 is used as the driving table and the large table test is used as the driven table. This is also the correct execution plan.

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 10000100 |
+----------+
1 row in set (0.33 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.02 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='test';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | test | 100 |
+--------------+------------+------------+
1 row in set (0.00 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='sbtest1';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | sbtest1 | 947468 |
+--------------+------------+------------+
1 row in set (0.01 sec)

mysql> select t.* from test t inner join sbtest1 t1 on t.id=t1.id where tc='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977' and t1.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977';
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 1 | 501885 | 08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977 | 63188288836-92351140030-06390587585-66802097351-49282961843 |
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.37 sec)

mysql> explain select t.* from test t inner join sbtest1 t1 on t.id=t1.id where tc='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977' and t1.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977';
+----+-------------+-------+------------+--------+---------------+--------+---------+------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+--------+---------+------------+--------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 947468 | 10.00 | Using where |
| 1 | SIMPLE | t | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.t1.id | 1 | 10.00 | Using where |
+----+-------------+-------+------------+--------+---------------+--------+---------+------------+--------+----------+-------------+
2 rows in set, 1 warning (0.01 sec)

Why didn't the optimizer choose the wrong execution plan? As mentioned in the previous article, MySQL 8.0 stores metadata information in the data dictionary table under the mysql library. The information_schema library only provides relatively convenient views for users to query. Therefore, when the optimizer selects an execution plan, it will obtain statistical information from the data dictionary table and generate a correct execution plan.

Summarize

To improve the query efficiency of information_schema, MySQL 8.0 caches the statistical information in the views tables and statistics. The cache expiration time is determined by the parameter information_schema_stats_expiry (it is recommended to set the parameter value to 0). This may cause users to be unable to obtain the latest and accurate statistical information when querying the corresponding views, but it does not affect the selection of the execution plan.

The above are the details of the reasons why MySQL 8.0 statistics are inaccurate. For more information about inaccurate MySQL 8.0 statistics, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Gearman + MySQL to achieve persistence operation example
  • Detailed explanation of deploying MySQL using Docker (data persistence)
  • Detailed explanation of Java emoji persistence in MySQL
  • MySQL 8 new features: how to modify persistent global variables
  • MySQL 8 new features: detailed explanation of persistence of auto-increment primary key
  • Overview of MySQL Statistics
  • Detailed explanation of MySQL persistent statistics

<<:  Javascript basics about built-in objects

>>:  5 Commands to Use the Calculator in Linux Command Line

Recommend

Detailed explanation of styles in uni-app

Table of contents Styles in uni-app Summarize Sty...

Vue implements pull-down to load more

Developers familiar with Element-UI may have had ...

Detailed explanation of how to reduce memory usage in MySql

Preface By default, MySQL will initialize a large...

React realizes secondary linkage effect (staircase effect)

This article shares the specific code of React to...

Detailed installation and configuration tutorial of PostgreSQL 11 under CentOS7

1. Official website address The official website ...

Solution to Vue data assignment problem

Let me summarize a problem that I have encountere...

How to fix the footer at the bottom of the page (multiple methods)

As a front-end Web engineer, you must have encoun...

Explanation of the steps for Tomcat to support https access

How to make tomcat support https access step: (1)...

How to handle MySQL numeric type overflow

Now, let me ask you a question. What happens when...

Introduction to the deletion process of B-tree

In the previous article https://www.jb51.net/arti...

Vue Element front-end application development table list display

1. List query interface effect Before introducing...

Detailed explanation of jquery tag selector application example

This article example shares the specific code of ...