A brief discussion on the underlying principle of mysql join

join algorithm
The difference between driven and non-driven tables

1. Simple Nested-Loop Join, simple nesting-no index
2. Index Nested-Loop Join-Indexed Case
3. Block Nested-Loop Join, join buffer

Buffer size

How to choose the connection order for tables with large and small data volumes

detail

join algorithm

MySQL only supports one join algorithm: Nested-Loop Join, but Nested-Loop Join has three variants:

Simple Nested-Loop Join, simple nesting
Index Nested-Loop Join
Block Nested-Loop Join, join buffer nesting (temporary table)

The difference between driven and non-driven tables

The driving table is the main table, and the non-driven table is the slave table. You can see it by looking at the following SQL. A is the driving table and B is the non-driven table.

select * from A left join B

When A joins B, must we check table A first and then table B?

The answer is not necessarily, because MySQL has an optimizer inside, which will make some optimizations based on your query statements. The optimizer also determines which table to check first, but it is certain that the table that is checked first is the driving table, and vice versa. As for the question of which table to check first, we can get the result by looking at the execution plan; just add the explain keyword in front;

explain select * from A join B;

1. Simple Nested-Loop Join, simple nesting-no index

A left join B: Each time, the whole table is matched. Each row of data in table A is matched once in table B. That is to say, if there are 10 records in table A and 1000 records in table B, the number of scans during query is 10*1000, which means that the query needs to scan 10000 times to get the data.

2. Index Nested-Loop Join-Indexed Case

select * from A join B on A.id=B.id where A.id = 1: When querying, the driver table A will search according to the index of the associated field. When a matching value is found on the index, the query will be returned to the table. In other words, the query will be returned to the table only after the index is matched.
If the associated field B.id of the non-driven table B is the primary key, the performance will be very high. If it is not the primary key, multiple table queries will be performed. First, the index will be associated, and then the table query will be performed based on the primary key id of the secondary index. The performance is slower than that of the primary key.

3. Block Nested-Loop Join, join buffer

If there is an index, Index Nested-Loop Join will be used to join the tables. If there is no index on the join column, Block Nested-Loop Join will be used. Join buffer. There is a buffer between the driver table and the non-driver table. When querying, the data of the driver table is first cached in the buffer, and then matched with the non-driver table in batches. This is an optimization solution that combines multiple comparisons into one comparison. Note: not only the columns of the associated table are cached here, but also the columns after the select.

Buffer size

By default, the capacity of the buffer join_biffer_size is 256k. If your data space is larger than 256k, the buffer cannot be used and the join will be converted to the simplest Simple Nested-Loop Join. However, we can manually adjust the buffer size to load large amounts of data. View the join_biffer_size sql: show variables like '%join_biffer_size%'

How to choose the connection order for tables with large and small data volumes

It is best to connect the small table to the big table, which will reduce the number of scans; for example, if the big table has 1,000 data records and the small table has only 10 data records, then the best connection method is: small table joins the big table; why do you do this?

If a large table is joined to a small table, and our data is in row 999 of the large table, then we have to scan it at least 999 times to find it when we query the data.
If a small table is joined to a large table, and our data is in row 9 of the small table,

detail

When querying tables together, it is best not to have more than three tables, because the fields that need to be joined must have the same data type.
Inner joins are preferred for joining tables. Outer joins consume more performance than inner joins in querying data.
Make sure the columns after on in the associated query or the fields in using() have indexes to speed up data access when joining tables.

This is the end of this article about the underlying principles of MySQL join. For more information about the underlying principles of MySQL join, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:

Analysis of usage scenarios of JOIN in SQL statements
MYSQL database basics - Join operation principle
How to solve the problem of invalid left join in MySQL and the precautions for its use
The process of quickly converting mysql left join to inner join
Why do code standards require SQL statements not to have too many joins?
MySQL efficient query left join and group by (plus index)
MySQL join buffer principle
Detailed explanation of various join summaries of SQL

<<: Vue implements small search function

>>: Determine the direction of mouse entry based on CSS

A brief discussion on the underlying principle of mysql join

Table of contents

join algorithm

The difference between driven and non-driven tables

1. Simple Nested-Loop Join, simple nesting-no index

2. Index Nested-Loop Join-Indexed Case

3. Block Nested-Loop Join, join buffer

Buffer size

How to choose the connection order for tables with large and small data volumes

detail

Docker Machine in-depth explanation

Example code for implementing the nine-grid layout of dynamic images with CSS

Do you know all 24 methods of JavaScript loop traversal?

Solution to the problem of information loss with "_" in header when using Nginx proxy

Overview and differences between html inline elements and html block-level elements

Implementation and usage scenarios of JS anti-shake throttling function

CSS3 uses transform deformation combined with events to complete fan-shaped navigation

Solve the problem that element edit form el-radio cannot be selected after echoing

MySQL 5.7.16 free installation version graphic tutorial under Linux

Docker implements container port binding local port

Recommend

Solution to the failure of loading dynamic library when Linux program is running

The use and difference between JavaScript pseudo-array and array

Tutorial on installing mongodb under linux

Description of the default transaction isolation level of mysql and oracle

Detailed explanation of VUE Token's invalidation process

JavaScript implements simple date effects

Complete MySQL Learning Notes

Detailed steps for smooth transition from MySQL to MariaDB

Exploring the practical value of the CSS property *-gradient

Detailed explanation of the observer mode starting from the component value transfer of Vue

Detailed tutorial on deploying Django project under CentOS

Explanation of factors affecting database performance in MySQL

Zabbix monitors the process of Linux system services

Instructions for recovering data after accidental deletion of MySQL database

SSH port forwarding to achieve intranet penetration