Why is it slow when using limit and offset paging scenarios?

Why is it slow when using limit and offset paging scenarios?

Let’s start with a question

Five years ago when I was at Tencent, I found that MySQL request speed was very slow in paging scenarios. When the data volume is only 10w, select xx from a single machine takes about 2 or 3 seconds.

I asked my master why, and he asked back, "In the index scenario, what is the time complexity of getting the nth largest number in MySQL?"

The search for answers

Confirm the scenario

Assume that there is an index on status. select * from table where status = xx limit 10 offset 10000.

It will be very slow. If the amount of data is not large, there will be a delay of several seconds.

Xiaobai answers

At that time, I felt very safe. My teacher would take care of me no matter what happened. My technical skills were the worst in the group anyway, so I made a blind guess and thought that finding a node would be just log(N). Naturally, my master let me study it on my own.

This stage took 10 minutes.

Continue answering

If you analyze it carefully, you will find that it is awkward to search through the index. Because you don't know the distribution of the first 100 numbers in the left subtree and the right subtree, it is impossible to use the search characteristics of the binary tree.

Through learning, I learned that MySQL's index is a B+ tree.

After looking at this picture, everything became clear. The 100th largest tree can be found directly through the linked list composed of leaf nodes with a complexity of O(n). But even if it is O(n), it is not so slow that it is outrageous. Is there any reason?

During this stage, I mainly searched for information online, which took me 10 days on and off.

System learning

Here are two books recommended. One is "MySQL Technology Insider InnoDB Storage Engine", through which you can have a deeper understanding of InnoDB's implementation mechanism, such as mvcc, index implementation, and file storage.

The second one is "High Performance MySQL", which starts from the usage level, but goes into depth and mentions many design ideas.

By combining the two books and studying them repeatedly, you can barely master MySQL.

There are two key concepts here:

  • Clustered index: contains the primary key index and the corresponding actual data. The leaf node of the index is the data node.
  • Auxiliary index: It can be understood as a secondary node, whose leaf node is also an index node and contains the primary key id.

Even if the first 10,000 will be thrown away, MySQL will use the primary key ID on the secondary index to check the data on the clustered index. This is 10,000 random IOs, so it is naturally as slow as a husky.

You may wonder why this behavior occurs. This is related to the layering of MySQL. Limit offset can only be used for the result set returned by the engine layer. In other words, the engine layer is also innocent, and it does not know that these 10,000 pieces are going to be thrown away.

The following is a diagram of MySQL layering. You can see that the engine layer and the server layer are actually separate.

Until this point, I roughly understood the reason for the slowness. This stage took one year.

comprehend by analogy

I had been working for 3 years at this time and started to look at some source code. After reading etcd, I read some tidb source code. Regardless of the type of database, a query statement is actually composed of logical operators.

Introduction to Logical Operators

Before writing specific optimization rules, let's briefly introduce some logical operators in the query plan.

  • DataSource is the data source, that is, the table, t in select * from t.
  • Selection, for example, select xxx from t where xx = 5, where the filter condition is.
  • Projection, selecting c from t in the query "select c from t" is a projection operation.
  • Join connection, select xx from t1, t2 where t1.c = t2.c means joining the two tables t1 and t2.

Selection, projection, and join (SPJ for short) are the most basic operators. There are many join modes, including inner join, left outer join, right outer join, etc.

After select b from t1, t2 where t1.c = t2.c and t1.a > 5 becomes a logical query plan, the DataSource corresponding to t1 t2 is responsible for retrieving the data.

A Join operator is added above to connect the results of the two tables according to t1.c = t2.c, then a Selection filter is performed according to t1.a > 5, and finally column b is projected.

The following figure is an unoptimized representation:

So it is not that MySQL does not want to pass limit and offset to the engine layer, but because the logical operators are divided, it is impossible to know how much qualified data is contained in the specific operator.

How to solve it

"High Performance MySQL" mentions two solutions

Solution 1

According to the actual business needs, see if it can be replaced with the next page and previous page functions, especially on iOS and Android, where the previous complete paging was not common.

Here, limit and offset are replaced by the auxiliary index (i.e. search condition) id. When the id is called again, it needs to be returned to the front end.

Solution 2

Face it head on. Here is a concept: index coverage: when the data queried by the auxiliary index only includes the ID and the auxiliary index itself, there is no need to query the clustered index.

The idea is as follows: select xxx,xxx from in (select id from table where second_index = xxx limit 10 offset 10000) This sentence means that we first search for the unique database id value corresponding to the data from the conditional query. Because the primary key is already on the secondary index, there is no need to return to the disk of the clustered index to pull it. Then use these 10 limited primary key IDs to query the clustered index. This will only do ten random IOs.

When the business really needs paging, using this solution can greatly improve performance. Usually meets performance requirements.

Final Thoughts

I am very grateful to my master for his guidance and patience in the three years before my graduation. He assigned me reading tasks during holidays, checked on my learning progress during lunch breaks, and guided me to explore problems by asking questions. After I graduated from Tencent, he gave me a lot of advice every time we met, imparted his knowledge and answered my questions, and he did his best in every aspect.

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in:
  • Laravel custom paging implementation example offset() and limit()

<<:  How to use Vue's idea to encapsulate a Storage

>>:  Linux user and group command example analysis [switching, adding users, permission control, etc.]

Recommend

How to design and optimize MySQL indexes

Table of contents What is an index? Leftmost pref...

Personal opinion: Talk about design

<br />Choose the most practical one to talk ...

Examples of optimistic locking and pessimistic locking in MySQL

The task of concurrency control in a database man...

Detailed explanation of data types in JavaScript basics

Table of contents 1. Data Type 1.1 Why do we need...

VMware Workstation installation Linux system

From getting started to becoming a novice, the Li...

The difference between mysql outer join and inner join query

The syntax for an outer join is as follows: SELEC...

Docker Tutorial: Using Containers (Simple Example)

If you’re new to Docker, take a look at some of t...

Detailed explanation of Nginx reverse proxy example

1. Reverse proxy example 1 1. Achieve the effect ...

js method to realize shopping cart calculation

This article example shares the specific code of ...

Solve the problem of black screen when starting VMware virtual machine

# Adjust VMware hard disk boot priority Step 1: E...

Install Docker environment in Linux environment (no pitfalls)

Table of contents Installation Prerequisites Step...

Summary of block-level elements, inline elements, and variable elements

Block element p - paragraph pre - format text tabl...