What are the advantages of using B+ tree index in MySQL?

What are the advantages of using B+ tree index in MySQL?

Before understanding this problem, let's first look at the storage structure of the MySQL table, and then compare the differences between binary trees, multi-trees, B trees, and B+ trees.

MySQL storage structure

Table storage structure

Unit: table > segment > area > page > row

In the database, whether you read one row or multiple rows, the pages where these rows are located are loaded. That is to say, the basic unit of storage space is page.
A page is a node of a B+ tree. The smallest unit of database I/O operation is a page, and all database-related content is stored in the page structure.

B+ tree index structure

  1. In a B+ tree, each node is a page. Every time a new node is created, a page space is requested.
  2. The nodes on the same layer are between, and a bidirectional linked list is formed through the page structure.
  3. Non-leaf nodes include multiple index rows, each of which stores the index key and a pointer to the next level page.
  4. The leaf node stores keywords and row records. There is a one-way linked list between the records inside the node (that is, inside the page structure).

B+ tree page node structure

There are several characteristics

  1. Divide all records into several groups, each group will store multiple records,
  2. The page directory stores slots, which are equivalent to the index of grouped records. Each slot pointer points to the last record of a different group.
  3. We locate the group through the slot and then view the records in the group

The main function of a page is to store records, and records are stored in a page in the form of a single linked list.
The advantage of a singly linked list is that it is easy to insert and delete, but its disadvantage is that the retrieval efficiency is low, and in the worst case all nodes in the linked list must be traversed. Therefore, a binary search method is provided in the page directory to improve the efficiency of record retrieval.

B+ tree retrieval process

Let's take a look at the retrieval process of the B+ tree.

  1. Starting from the root of the B+ tree, find the leaf nodes layer by layer.
  2. Find the leaf node as the corresponding data page, load the data leaf into the memory, and use binary search through the slots of the page directory to first find a rough record grouping.
  3. The records are searched in the group by traversing the linked list.

Why use B+ tree index?

The database accesses data through pages. A page is a B+ tree node. Accessing a node is equivalent to an I/O operation, so the faster the node can be found, the better the search performance.
The characteristic of B+ tree is that it is short and fat enough, which can effectively reduce the number of node accesses and thus improve performance.

Next, let's compare a binary tree, a multi-fork tree, a B-tree, and a B+ tree.

Binary Tree

A binary tree is a binary search tree with good search performance, which is equivalent to a binary search.
But when N is larger, the depth of the tree is higher. The time for data query mainly depends on the number of disk IO. The deeper the binary tree is, the more searches are performed and the worse the performance is.
The worst case is that it degenerates into a linked list, as shown below

In order to prevent the binary tree from degenerating into a linked list, people invented the AVL tree (balanced binary search tree): the height difference between the left and right subtrees of any node is at most 1.

Multi-branch tree

A multi-fork tree can have M nodes, which can effectively reduce the height. When the height is reduced, the number of nodes is reduced and the I/O is naturally reduced, and the performance is better than that of a binary tree.

B-Tree

A B-tree is simply a multi-branch tree, where each leaf stores data and a pointer to the next node.

For example, to find 9, the steps are as follows

  1. We compare it with the keyword (17, 35) of the root node. 9 is less than 17, so we get pointer P1.
  2. Follow pointer P1 to find disk block 2, the key is (8, 12), because 9 is between 8 and 12, so we get pointer P2;
  3. Follow pointer P2 to find disk block 6, the key is (9, 10), and then we find key 9.

B+ Tree

B+ tree is an improvement of B tree. Simply put, only leaf nodes store data, and non-leaf nodes are storage pointers; all leaf nodes form an ordered linked list.

The internal nodes of the B+ tree do not have pointers to the specific information of the keyword, so its internal nodes are smaller than those of the B tree. If all the keywords of the same internal node are stored in the same disk block, the disk block can accommodate more keywords, and the number of keywords that need to be searched at one time is also more, which reduces the relative IO read and write times.

For example, to search for keyword 16, the steps are as follows

  1. Compare with the root node's keyword (1, 18, 35), 16 is between 1 and 18, and get pointer P1 (pointing to disk block 2)
  2. Find disk block 2, the key is (1, 8, 14), because 16 is greater than 14, so get pointer P3 (pointing to disk block 7)
  3. We find disk block 7, the key is (14, 16, 17), and then we find key 16, so we can find the data corresponding to key 16.

The difference between B+ tree and B tree:

  1. Non-leaf nodes of B+ trees do not have data but only indexes. Non-leaf nodes of B trees store data.
  2. B+ tree query is more efficient. The B+ tree uses a bidirectional linked list to connect all leaf nodes, making range queries more efficient (because all data is in the leaf nodes of the B+ tree, and scanning the database only requires scanning the leaf nodes once), but the B-tree requires an in-order traversal to complete the search range.
  3. B+ tree query efficiency is more stable. The B+ tree must query the leaf node every time to find data, and the data queried by the B-tree may not be in the leaf node, or it may be in the leaf node, which will cause the query efficiency to be unstable.
  4. B+ trees have lower disk read and write costs. The internal nodes of the B+ tree do not have pointers to the specific information of the keyword, so its internal nodes are smaller than those of the B-tree. Usually, the B+ tree is shorter and fatter, and the small query generates less I/O.

That’s why MySQL uses B+ trees, it’s that simple!

The above is the detailed content of the advantages of using B+ tree index in MySQL. For more information about using B+ tree index in MySQL, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Why does MySQL database index choose to use B+ tree?
  • What are the benefits of using B+ tree as index structure in MySQL?
  • Analysis of the reasons why MySQL's index system uses B+ tree
  • The reason why MySQL uses B+ tree as its underlying data structure
  • Detailed explanation of the difference between B-tree index and B+ tree index in MySQL
  • Detailed explanation of MySQL B+ tree index and hash index
  • An article to understand why the MySQL index data structure uses B+ tree

<<:  CSS implementation code for drawing triangles (border method)

>>:  HTML mouse css control

Recommend

Detailed steps to install the NERDTree plugin in Vim on Ubuntu

NERDTree is a file system browser for Vim. With t...

MySQL 8.0.16 installation and configuration tutorial under CentOS7

Uninstall the old version of MySQL (skip this ste...

A brief discussion on the problem of forgotten mysql password and login error

If you forget your MySQL login password, the solu...

AsyncHooks asynchronous life cycle in Node8

Async Hooks is a new feature of Node8. It provide...

JavaScript to achieve tab switching effect

This article shares the specific code of JavaScri...

How to install mongodb 4.2 using yum on centos8

1. Make a repo file Refer to the official install...

IDEA2020.1.2 Detailed tutorial on creating a web project and configuring Tomcat

This article is an integrated article on how to c...

Implementation of Docker building Maven+Tomcat basic image

Preface In Java programming, most applications ar...

Linux beginners in virtual machines configure IP and restart the network

For those who are new to virtual machines or have...

How to solve the problem of margin overlap

1. First, you need to know what will trigger the v...

HTML text escape tips

Today I saw a little trick for HTML text escaping ...

RHEL7.5 mysql 8.0.11 installation tutorial

This article records the installation tutorial of...

How to install git on linux

1. Introduction Git is a free, open source distri...

Solution to ES memory overflow when starting docker

Add the jvm.options file to the elasticsearch con...