A brief discussion on the underlying principle of mysql join

A brief discussion on the underlying principle of mysql join

join algorithm

MySQL only supports one join algorithm: Nested-Loop Join, but Nested-Loop Join has three variants:

  • Simple Nested-Loop Join, simple nesting
  • Index Nested-Loop Join
  • Block Nested-Loop Join, join buffer nesting (temporary table)

The difference between driven and non-driven tables

The driving table is the main table, and the non-driven table is the slave table. You can see it by looking at the following SQL. A is the driving table and B is the non-driven table.

select * from A left join B

When A joins B, must we check table A first and then table B?

The answer is not necessarily, because MySQL has an optimizer inside, which will make some optimizations based on your query statements. The optimizer also determines which table to check first, but it is certain that the table that is checked first is the driving table, and vice versa. As for the question of which table to check first, we can get the result by looking at the execution plan; just add the explain keyword in front;

explain select * from A join B;

1. Simple Nested-Loop Join, simple nesting-no index

A left join B: Each time, the whole table is matched. Each row of data in table A is matched once in table B. That is to say, if there are 10 records in table A and 1000 records in table B, the number of scans during query is 10*1000, which means that the query needs to scan 10000 times to get the data.

2. Index Nested-Loop Join-Indexed Case

select * from A join B on A.id=B.id where A.id = 1: When querying, the driver table A will search according to the index of the associated field. When a matching value is found on the index, the query will be returned to the table. In other words, the query will be returned to the table only after the index is matched.
If the associated field B.id of the non-driven table B is the primary key, the performance will be very high. If it is not the primary key, multiple table queries will be performed. First, the index will be associated, and then the table query will be performed based on the primary key id of the secondary index. The performance is slower than that of the primary key.

3. Block Nested-Loop Join, join buffer

If there is an index, Index Nested-Loop Join will be used to join the tables. If there is no index on the join column, Block Nested-Loop Join will be used. Join buffer. There is a buffer between the driver table and the non-driver table. When querying, the data of the driver table is first cached in the buffer, and then matched with the non-driver table in batches. This is an optimization solution that combines multiple comparisons into one comparison. Note: not only the columns of the associated table are cached here, but also the columns after the select.

Buffer size

By default, the capacity of the buffer join_biffer_size is 256k. If your data space is larger than 256k, the buffer cannot be used and the join will be converted to the simplest Simple Nested-Loop Join. However, we can manually adjust the buffer size to load large amounts of data. View the join_biffer_size sql: show variables like '%join_biffer_size%'

How to choose the connection order for tables with large and small data volumes

It is best to connect the small table to the big table, which will reduce the number of scans; for example, if the big table has 1,000 data records and the small table has only 10 data records, then the best connection method is: small table joins the big table; why do you do this?

  • If a large table is joined to a small table, and our data is in row 999 of the large table, then we have to scan it at least 999 times to find it when we query the data.
  • If a small table is joined to a large table, and our data is in row 9 of the small table,

detail

  • When querying tables together, it is best not to have more than three tables, because the fields that need to be joined must have the same data type.
  • Inner joins are preferred for joining tables. Outer joins consume more performance than inner joins in querying data.
  • Make sure the columns after on in the associated query or the fields in using() have indexes to speed up data access when joining tables.

This is the end of this article about the underlying principles of MySQL join. For more information about the underlying principles of MySQL join, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Analysis of usage scenarios of JOIN in SQL statements
  • MYSQL database basics - Join operation principle
  • How to solve the problem of invalid left join in MySQL and the precautions for its use
  • The process of quickly converting mysql left join to inner join
  • Why do code standards require SQL statements not to have too many joins?
  • MySQL efficient query left join and group by (plus index)
  • MySQL join buffer principle
  • Detailed explanation of various join summaries of SQL

<<:  Vue implements small search function

>>:  Determine the direction of mouse entry based on CSS

Recommend

Solution to the failure of loading dynamic library when Linux program is running

Unable to load dynamic library under Linux When t...

The use and difference between JavaScript pseudo-array and array

Pseudo-arrays and arrays In JavaScript, except fo...

Tutorial on installing mongodb under linux

MongoDB is cross-platform and can be installed on...

Description of the default transaction isolation level of mysql and oracle

1. Transaction characteristics (ACID) (1) Atomici...

Detailed explanation of VUE Token's invalidation process

Table of contents Target Thought Analysis Code la...

JavaScript implements simple date effects

The specific code of JavaScript date effects is f...

Complete MySQL Learning Notes

Table of contents MyISAM and InnoDB Reasons for p...

Detailed steps for smooth transition from MySQL to MariaDB

1. Introduction to MariaDB and MySQL 1. Introduct...

Exploring the practical value of the CSS property *-gradient

Let me first introduce an interesting property - ...

Detailed tutorial on deploying Django project under CentOS

Basic Environment Pagoda installation service [Py...

Explanation of factors affecting database performance in MySQL

A story about database performance During the int...

Zabbix monitors the process of Linux system services

Zabbix automatically discovers rules to monitor s...

Instructions for recovering data after accidental deletion of MySQL database

In daily operation and maintenance work, backup o...

SSH port forwarding to achieve intranet penetration

The machines in our LAN can access the external n...