A brief analysis of MySQL cardinality statistics

A brief analysis of MySQL cardinality statistics

1. What is the cardinality?
Cardinality refers to the number of different values ​​​​in a column of a MySQL table.

If this type of index is unique, then the cardinality = number of rows.

If this column is sex, and the enumeration type only has male and female, then its cardinality is 2

The higher the Cardinality, the more worthy the column is to be indexed. MySQL execution plan also selects indexes based on Cardinality.

The cardinality of each column in the table can be seen in the following way.

Take this classic example:
There is a column called sex. The values ​​stored in the sex column are either male or female, and its maximum cardinality is 2.
There is no need to create an index for sex. Because, in order to improve your sex-based query speed, MySQL will create a brand new B+Tree for the new index you selected. But your sex has only two values. For MySQL, even if it creates a B+Tree index for the column you specify, when the query is actually executed, at most one binary query is performed, and the remaining operations can only be traversal, so it is not meaningful to create an index for sex.

2. When does InnoDB update the cardinality?
Parameter: innodb_stats_auto_recalc controls whether MySQL actively recalculates this persistent information. The default is 1 for true and 0 for false.
By default, cardinality information is recalculated when the rows in a table change by more than 10%.

3. The base number is estimated

The base number is not updated in real time! And it is a value estimated through sampling!

As for what the formula for the base is, it may not be important.

It is important to know that it is an estimate calculated by randomly sampling data pages.

And the number of randomly sampled pages can be set by the parameter innodb_stats_persistent_sample_pages, the default value is 20.

This means that the base value is not accurate, and even the results of your calculations may differ greatly each time.

4. Persistent Cardinality

You can use the parameter innodb_stats_persistent to control whether the cardinality is persisted. The default setting is off.

Of course you can set STATS_PERSISTENT=1 for a single table and its innodb_stats_persistent will be automatically enabled.

The advantage of enabling it is that restarting MySQL will not recalculate this value, thus speeding up the restart process.

5. How to actively update the base?

Executing the following SQL will trigger InnoDB to update the cardinality (even if you don't realize it will update the cardinality).

So try to choose a low business peak period

analyze table tableName;

If the number of samples is too small, the calculated cardinality will be extremely wrong. That will most likely cause the MySQL optimizer to choose the wrong index. This is where you can increase the value appropriately. But increasing it too much may cause ANALYZE TABLE to run slowly.

Conversely, ANALYZE TABLE runs too slowly. You can adjust the value of the parameter innodb_stats_persistent_sample_pages appropriately. However, this may lead to inaccurate cardinality calculation.

If there is no way to balance the relationship between the two. Consider reducing the number of indexed columns in the table or limiting the number of partitions to reduce ANALYZE TABLE complexity. The number of columns in the table's primary key is also important because the primary key columns are appended to each nonunique index.

refer to:

https://dev.mysql.com/doc/refman/5.7/en/innodb-persistent-stats.html

https://dev.mysql.com/doc/refman/5.7/en/innodb-analyze-table-complexity.html

The above is a brief analysis of the details of MySQL cardinality statistics. For more information about MySQL cardinality statistics, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Analysis of the principles of Mysql dirty page flush and shrinking table space
  • Recommend several MySQL related tools
  • A brief analysis of MySQL's lru linked list
  • MySQL Query Cache and Buffer Pool
  • mysql method to recursively search for all child nodes of a menu node
  • What is a MySQL tablespace?
  • How to locate MySQL slow queries
  • MySQL Flush-List and dirty page flushing mechanism

<<:  Detailed explanation of web page loading progress bar (recommended)

>>:  Solution to the problem of insufficient storage resource pool of Docker server

Recommend

Native JS to implement real-time clock

Share a real-time clock effect implemented with n...

Detailed explanation of CSS3+JS perfect implementation of magnifying glass mode

About a year ago, I wrote an article: Analysis of...

Tutorial on installing rabbitmq using yum on centos8

Enter the /etc/yum.repos.d/ folder Create rabbitm...

Specific use of pthread_create in linux to create threads

pthread_create function Function Introduction pth...

Differences and usage examples of for, for...in, for...of and forEach in JS

for loop Basic syntax format: for(initialize vari...

MySQL performance optimization: how to use indexes efficiently and correctly

Practice is the only way to test the truth. This ...

MySQL query tree structure method

Table of contents MySQL query tree structure 1. A...

How to connect to MySQL using C++

C++ connects to MySQL for your reference. The spe...

How to remove MySQL from Ubuntu and reinstall it

First delete mysql: sudo apt-get remove mysql-* T...

MySQL Series 10 MySQL Transaction Isolation to Implement Concurrency Control

Table of contents 1. Concurrent access control 2....

js tag syntax usage details

Table of contents 1. Introduction to label statem...

Three ways to forward linux ssh port

ssh is one of the two command line tools I use mo...