What are the advantages of using B+ tree index in MySQL?

Before understanding this problem, let's first look at the storage structure of the MySQL table, and then compare the differences between binary trees, multi-trees, B trees, and B+ trees.

MySQL storage structure

Table storage structure

Unit: table > segment > area > page > row

In the database, whether you read one row or multiple rows, the pages where these rows are located are loaded. That is to say, the basic unit of storage space is page.
A page is a node of a B+ tree. The smallest unit of database I/O operation is a page, and all database-related content is stored in the page structure.

B+ tree index structure

In a B+ tree, each node is a page. Every time a new node is created, a page space is requested.
The nodes on the same layer are between, and a bidirectional linked list is formed through the page structure.
Non-leaf nodes include multiple index rows, each of which stores the index key and a pointer to the next level page.
The leaf node stores keywords and row records. There is a one-way linked list between the records inside the node (that is, inside the page structure).

B+ tree page node structure

There are several characteristics

Divide all records into several groups, each group will store multiple records,
The page directory stores slots, which are equivalent to the index of grouped records. Each slot pointer points to the last record of a different group.
We locate the group through the slot and then view the records in the group

The main function of a page is to store records, and records are stored in a page in the form of a single linked list.
The advantage of a singly linked list is that it is easy to insert and delete, but its disadvantage is that the retrieval efficiency is low, and in the worst case all nodes in the linked list must be traversed. Therefore, a binary search method is provided in the page directory to improve the efficiency of record retrieval.

B+ tree retrieval process

Let's take a look at the retrieval process of the B+ tree.

Starting from the root of the B+ tree, find the leaf nodes layer by layer.
Find the leaf node as the corresponding data page, load the data leaf into the memory, and use binary search through the slots of the page directory to first find a rough record grouping.
The records are searched in the group by traversing the linked list.

Why use B+ tree index?

The database accesses data through pages. A page is a B+ tree node. Accessing a node is equivalent to an I/O operation, so the faster the node can be found, the better the search performance.
The characteristic of B+ tree is that it is short and fat enough, which can effectively reduce the number of node accesses and thus improve performance.

Next, let's compare a binary tree, a multi-fork tree, a B-tree, and a B+ tree.

Binary Tree

A binary tree is a binary search tree with good search performance, which is equivalent to a binary search.
But when N is larger, the depth of the tree is higher. The time for data query mainly depends on the number of disk IO. The deeper the binary tree is, the more searches are performed and the worse the performance is.
The worst case is that it degenerates into a linked list, as shown below

In order to prevent the binary tree from degenerating into a linked list, people invented the AVL tree (balanced binary search tree): the height difference between the left and right subtrees of any node is at most 1.

Multi-branch tree

A multi-fork tree can have M nodes, which can effectively reduce the height. When the height is reduced, the number of nodes is reduced and the I/O is naturally reduced, and the performance is better than that of a binary tree.

B-Tree

A B-tree is simply a multi-branch tree, where each leaf stores data and a pointer to the next node.

For example, to find 9, the steps are as follows

We compare it with the keyword (17, 35) of the root node. 9 is less than 17, so we get pointer P1.
Follow pointer P1 to find disk block 2, the key is (8, 12), because 9 is between 8 and 12, so we get pointer P2;
Follow pointer P2 to find disk block 6, the key is (9, 10), and then we find key 9.

B+ Tree

B+ tree is an improvement of B tree. Simply put, only leaf nodes store data, and non-leaf nodes are storage pointers; all leaf nodes form an ordered linked list.

The internal nodes of the B+ tree do not have pointers to the specific information of the keyword, so its internal nodes are smaller than those of the B tree. If all the keywords of the same internal node are stored in the same disk block, the disk block can accommodate more keywords, and the number of keywords that need to be searched at one time is also more, which reduces the relative IO read and write times.

For example, to search for keyword 16, the steps are as follows

Compare with the root node's keyword (1, 18, 35), 16 is between 1 and 18, and get pointer P1 (pointing to disk block 2)
Find disk block 2, the key is (1, 8, 14), because 16 is greater than 14, so get pointer P3 (pointing to disk block 7)
We find disk block 7, the key is (14, 16, 17), and then we find key 16, so we can find the data corresponding to key 16.

The difference between B+ tree and B tree:

Non-leaf nodes of B+ trees do not have data but only indexes. Non-leaf nodes of B trees store data.
B+ tree query is more efficient. The B+ tree uses a bidirectional linked list to connect all leaf nodes, making range queries more efficient (because all data is in the leaf nodes of the B+ tree, and scanning the database only requires scanning the leaf nodes once), but the B-tree requires an in-order traversal to complete the search range.
B+ tree query efficiency is more stable. The B+ tree must query the leaf node every time to find data, and the data queried by the B-tree may not be in the leaf node, or it may be in the leaf node, which will cause the query efficiency to be unstable.
B+ trees have lower disk read and write costs. The internal nodes of the B+ tree do not have pointers to the specific information of the keyword, so its internal nodes are smaller than those of the B-tree. Usually, the B+ tree is shorter and fatter, and the small query generates less I/O.

That’s why MySQL uses B+ trees, it’s that simple!

The above is the detailed content of the advantages of using B+ tree index in MySQL. For more information about using B+ tree index in MySQL, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:

Why does MySQL database index choose to use B+ tree?
What are the benefits of using B+ tree as index structure in MySQL?
Analysis of the reasons why MySQL's index system uses B+ tree
The reason why MySQL uses B+ tree as its underlying data structure
Detailed explanation of the difference between B-tree index and B+ tree index in MySQL
Detailed explanation of MySQL B+ tree index and hash index
An article to understand why the MySQL index data structure uses B+ tree

<<: CSS implementation code for drawing triangles (border method)

>>: HTML mouse css control