Let's talk about the performance of MySQL's COUNT(*)

Preface

Basically, programmers in the workplace use count(*), count(1) or count(primary key) to count the number of rows in a database table. Do you understand the differences and performance between them?

In fact, during the development process, it is a very time-consuming operation for programmers to count the total number of rows in a large table. So which method should we use to count faster?

Next, let’s talk about the method and performance of counting the total number of rows in MySQL.

Which is faster: count(*), count(1), or count(primary key)?

1. Create a table and insert 10 million records for experimental testing:

# Create a test table CREATE TABLE `t6` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(50) NOT NULL,
 `status` tinyint(4) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `idx_status` (`status`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

# Create a stored procedure to insert 1000w data CREATE PROCEDURE insert_1000w()
BEGIN
  DECLARE i INT;
  SET i=1;
  WHILE i<=10000000 DO
    INSERT INTO t6(name,status) VALUES('god-jiang-666',1);
    SET i=i+1;
  END WHILE;
END;

#Call the stored procedure to insert 10 million rows of data call insert_1000w();

2. Analyze the experimental results

# It took 0.572 seconds select count(*) from t6;

insert image description here

# It took 0.572 seconds select count(1) from t6;

insert image description here

# It took 0.580 seconds select count(id) from t6;

insert image description here

# It took 0.620 seconds select count(*) from t6 force index (primary);

insert image description here

From the above experiment, we can conclude that count(*) and count(1) are the fastest, followed by count(id), and the slowest is when count uses a mandatory primary key .

Let's continue to test their respective execution plans:

explain select count(*) from t6;
show warnings;

insert image description here

explain select count(1) from t6;
show warnings;

insert image description here

explain select count(id) from t6;
show warnings;

insert image description here

explain select count(*) from t6 force index (primary);
show warnings;

insert image description here

From the above experiment, we can draw the following three points:

The MySQL query optimizer rewrites count(*) to count(0) and selects the idx_status index.
Both count(1) and count(id) select the idx_statux index
After adding force index (primary), the forced index is used.

This idx_status is equivalent to a secondary auxiliary index tree. Its purpose is to illustrate that when InnoDB processes count(*), if there is an auxiliary index tree, it will give priority to the auxiliary index tree to count the total number of rows.

To verify the conclusion that count(*) will give priority to the auxiliary index tree, let's continue with the following experiment:

# Delete the idx_status index and continue to execute count(*)
alter table t6 drop index idx_status;

explain select count(*) from t6;

insert image description here

From the above experiment, we can conclude that if the auxiliary index tree idx_status is deleted, count(*) will choose to use the primary key index. So the conclusion is: count(*) will give priority to the auxiliary index. If there is no auxiliary index, the primary key index will be used.

Why does count(*) prefer secondary indexes?

Prior to MySQL 5.7.18, InnoDB processed count(*) statements by scanning the clustered index.

Starting with MySQL 5.7.18, InnoDB handles COUNT(*) statements by traversing the smallest available secondary index. If no secondary index exists, the clustered index is scanned.

Why does the new version use secondary indexes to process count(*)?

Because the leaf nodes of the InnoDB secondary index tree store the primary key, and the leaf nodes of the primary key index tree store the entire row of data, the secondary index tree is smaller than the primary key index tree. Therefore, the query optimizer gives priority to secondary indexes based on cost considerations. So index count(*) is faster than count(primary key).

Summarize

The conclusion of this article is count(*)=count(1)>count(id) .

Why is count(id) slower without the primary key index? Because count(id) needs to retrieve the primary key, then check whether it is empty, and then accumulate it, which is more expensive.

Count(*) will count all NOT NULL and NULL fields, while count(id) will not count NULL fields, so we should try to use NOT NULL when building tables and give it a default of empty.

Finally, when you want to total the number of rows in a database table in the future, you can boldly use count(*) or count(1).

References

Chapter 6 of "High Performance MySQL" (3rd Edition) Optimizing COUNT() Queries
"MySQL Practice 45 Lectures" by Lin Xiaobin

This is the end of this article about the performance of MySQL COUNT(*). For more information about MySQL COUNT(*), please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:

Optimized implementation of count() for large MySQL tables
Usage and performance optimization techniques of aggregate function count in MySQL
Sharing on count optimization issues in innodb in mysql
Detailed explanation of the correct use of the count function in MySQL
A brief discussion on MySQL count of rows
Summary of methods to improve mysql count
Detailed explanation of count without filter conditions in MySQL
Summary of the differences between count(*), count(1) and count(col) in MySQL
Detailed explanation of mySQL count data examples in multiple tables
Use and optimization of MySQL COUNT function

<<: Limit HTML text box input to only numbers and decimal points

>>: Use non-root users to execute script operations in docker containers