MYSQL database basics - Join operation principle

Join uses the Nested-Loop Join algorithm. There are three types of Nested-Loop Join.

select * from t1 join t2 on t1.a = t2.a;
-- a 100 data items, b 1000 data items

Simple Nested-Loop Join

The entire table t1 will be traversed. T1 is used as the driving table. Each data in t1 will be queried in the entire table in t2. This process will be compared 100*1000 times.

Every time a full table query is performed in t2, the full table scan is not guaranteed to be in memory, the Buffer Pool will be eliminated, and it may be on disk.

Block Nested-Loop Join (MySQL driver link does not use index)

It will traverse the entire t1 table, load the t1 data into join_buffer, and then traverse the entire t2 table to match each piece of data in t2 with the data cached in t1 in join_buffer.

t1 full table scan = 100 times

t2 full table scan = 1000 times

Number of queries = 1100

Comparisons in join_buffer = 100 * 1000 times

The number of comparisons is the same as that of Simple Nested-Loop Join, but the comparison process is much faster than Simple Nested-Loop Join and has better performance.

join_buffer has a size. If the data found in t1 is larger than the size of join_buffer, part of the data in t1 will be loaded first. After comparing t2, join_buffer will be cleared and the remaining data in t1 will be loaded. If the loading is incomplete, the operation will be repeated.

The number of full table scans for t1 remains the same as the number in join_buffer 1, but the number of scans for t2 is multiplied by the number of segments.

Assume that the number of data rows in the driving table is N, which needs to be divided into K segments to complete the algorithm process, and the number of data rows in the driven table is M.

K = λ * N

Scan the driven table times = M * λ * N

λ is related to the size of join_buffer. When the join_buffer size is large enough, the time for large table driver and small table driver is the same.

When segmentation is required, the fewer the segmentation times, the fewer times the driven table is scanned, so a small table driver should be used.

Index Nested-Loop Join (MySQL driver link uses index)

Let’s take the above SQL as an example, if field a is indexed.

The entire t1 table will be scanned, and each data in the t1 table will be indexed in the t2 table. After the ID is found, the table will be queried again (if the connection field is the primary key of the t2 table, the table retrieval operation will be omitted).

t1 scans the entire table = 100 times

t2 index queries = log1000 times

t2 table query = log1000 times

Assume that the number of data rows in the driving table is N, and the number of data rows in the driven table is M.

Total number of queries = N + N * 2logM

As can be seen from the above, the larger the data in the driving table, the more queries there will be, so a small table should be used as the driving table.

The article refers to "MySQL Practical 45 Lectures--Lecture 34"

Summarize

This is the end of this article about the basics of MYSQL database Join operation principle. For more relevant MYSQL Join principle content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in: