What are the advantages of using B+Tree as an index in MySQL?

What are the advantages of using B+Tree as an index in MySQL?

Why do databases need indexes?

We all know that database data is stored on disk. When our program is started, it is equivalent to a process running in the machine's memory. So when our program wants to query data, it must go out of the memory to the disk to find the data, and then write the data back to the memory. However, the IO efficiency of the disk is far lower than that of the memory, so the speed of searching for data directly affects the efficiency of the program.
The main purpose of adding indexes to a database is to use a suitable data structure, which can make data query more efficient, reduce the number of disk IOs, and increase the speed of data search, rather than a foolish global traversal.

Why does the index use the B+Tree data structure?

If we think simply, if we want to find data quickly, it seems that the hash table is the fastest. According to the key, hash it to a certain slot, and then we can find the location of the data accurately with just one search. How fast is that? However, when we are doing business, we often only need one piece of data. Most of the needs are to query a part of the data based on certain conditions. At this time, hash display is not very suitable.

Let's consider trees, such as binary trees, balanced binary trees, red-black trees, B-trees, etc. They all use binary search and are fast at finding numbers. However, no matter whether it is a balanced binary tree or an optimized red-black tree, they are all binary trees in the final analysis. When there are more nodes, their height will be higher. Let me find some data. If the root node is not there, then I will look for the next layer. If the next layer still doesn’t have the data, I will look for the next layer again. The consequence of this is that I may have to search for a piece of data several times, and each time a disk IO is executed. The purpose of our index is to reduce disk IO, so this design is not acceptable. So can we just reduce the height?
So let's consider the B-tree again. First, let's briefly introduce the data structure of the B-tree:
First, let's look at the definition of B-tree.

  1. Each node has at most m-1 keywords (key-value pairs that can be stored).
  2. The root node can have at least one keyword.
  3. A non-root node has at least m/2 keywords.
  4. The keywords in each node are arranged in ascending order. All keywords in the left subtree of each keyword are smaller than it, and all keywords in the right subtree are larger than it.
  5. All leaf nodes are located in the same layer, or the length from the root node to each leaf node is the same.
  6. Each node stores index and data, that is, the corresponding key and value.

Therefore, the range of the number of keywords for the root node is: 1 <= k <= m-1, and the range of the number of keywords for non-root nodes is: m/2 <= k <= m-1.

Here m represents the order, which indicates how many child nodes a node has at most, so the order of a B-tree needs to be specified when describing it.

Let's take another example to illustrate the above concept. For example, here is a 5-order B-tree with a root node number range of 1 <= k <= 4 and a non-root node number range of 2 <= k <= 4.

Next, we will explain the insertion process of the B-tree through an insertion example, and then explain the process of deleting keywords.

B-Tree Insert

When inserting, we need to remember a rule: determine whether the number of keys of the current node is less than or equal to m-1. If it is satisfied, just insert it directly. If not, use the middle key of the node to divide the node into two parts, and put the middle node into the parent node.

Example: In a 5-order B-tree, a node has a maximum of 4 keys and a minimum of 2 keys (Note: the following nodes are uniformly represented by one node to represent key and value).

Insert 18, 70, 50, 40

Insert 22

When inserting 22, it is found that the keyword of this node is already greater than 4, so it needs to be split. The rules for splitting have been mentioned above. After the split, it is as follows.

Then insert 23, 25, 39

Split and get the following.

Therefore, the number of nodes in each layer of the B-tree will increase. For the same amount of data, the B-tree will be lower than the binary tree, and the number of IO operations required will be reduced, so it meets our indexing requirements. So why did MySQL finally choose B+ tree? How is it better than B tree?
Let's first look at the differences between B+ trees and B trees:

  • The B+ tree leaf nodes contain all the key values ​​of the tree. Non-leaf nodes do not store data, only indexes. Data is stored in leaf nodes. In B-tree, each node stores index and data.
  • Each leaf node of the B+ tree stores pointers to adjacent leaf nodes, and the leaf nodes themselves are linked in ascending order according to the size of the keyword.

As shown in the figure:

First point: When non-leaf nodes only store index keys but not data, the space occupied by non-leaf nodes can be reduced. Nodes with the same capacity can store more indexes. For the same three-layer B+ tree, the number of levels will increase, and it can store more data than the B-tree.
The second point: B+ tree leaf nodes store pointers to adjacent leaf nodes. To understand the benefits of this pointer, we first need to know that when the disk reads data, it is often not strictly read on demand, but it will pre-read every time. Even if only one byte is needed, the disk will start from this position and read a certain length of data backward in sequence into the memory. The theoretical basis for this is the famous locality principle in computer science:

  • When a piece of data is used, the nearby data will usually be used immediately.
  • The data required during program execution is usually concentrated.

The length of the pre-read is generally an integer multiple of a page. A page is a logical block of computer management memory. Hardware and operating systems often divide main memory and disk storage areas into continuous blocks of equal size. Each storage block is called a page (in many operating systems, the page size is usually 4k). Main memory and disk exchange data in units of pages. When the data that the program wants to read is not in the main memory, a page fault exception will be triggered. At this time, the system will send a read signal to the disk. The disk will find the starting position of the data and read one or several pages continuously and load them into the memory. Then the exception will be returned and the program will continue to run.

Now let's look at the pointer of the B+ tree child node, and we understand its use. When reading in advance, it can ensure that the data read continuously is in order.

Some students may have mentioned the B* tree, which is based on the B+ tree and adds linked list pointers for non-leaf nodes. Personally, I think the B-star tree is unnecessary because we don't store data in non-leaf nodes. The data is all in the leaf nodes, and the linked list pointers in non-leaf nodes are not used.

Some fancy concepts

Clustered index and non-clustered index: As mentioned above, the leaf nodes of the B+ tree store the data of the index key, but different MySQL engines have different choices for storing data. MyISAM stores the index file and the actual data file in two files. The data stored in the index file is the address value of the data corresponding to the index key in the data file, while InnoDB stores the formal data in the leaf nodes. Therefore, clustering and non-clustering are to distinguish whether the data stored in the leaf nodes is real (it can be understood as whether the leaf nodes are crowded?)

Returning to the table: Returning to the table is also simple, but you must first understand the primary key index and the ordinary index. The leaf nodes we mentioned above store real data, which is only stored in the primary key index. The data stored in the ordinary index is the key of the primary key index. Then it is easier for us to understand. For example, I have created a normal index for the name field of a table. I want to select * from table where name = 'test'. When we find the test node, the key we get is only the primary key corresponding to this row of data. If we want to get the data of the entire row, we can only use this key to search the primary key index tree again. This operation is called table return.

Leftmost matching principle: When we create a new composite index, such as (name+age), when querying using where name = xx and age = xx, the composite index will be used, while where age = xx and name = xx will not be used. This is because the rule for MySQL to create a joint index is to first sort the leftmost field of the joint index, and then sort the second field based on the sorting of the first field.

The above is the detailed content of the advantages of using B+Tree as index in MySQL. For more information about the advantages of using B+Tree as index in MySQL, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • What are the advantages of using B+ tree index in MySQL?
  • What are the benefits of using B+ tree as index structure in MySQL?
  • Why does MySQL database index choose to use B+ tree?
  • How to get the height of MySQL innodb B+tree
  • Detailed explanation of the difference between MySQL normal index and unique index
  • A brief discussion on which fields in Mysql are suitable for indexing
  • MySQL uses covering index to avoid table return and optimize query

<<:  Is it necessary to give alt attribute to img image tag?

>>:  Web Design Teaching or Learning Program

Recommend

How to implement rounded corners with CSS3 using JS

I found an example when I was looking for a way t...

Detailed Introduction to Nginx Installation and Configuration Rules

Table of contents 1. Installation and operation o...

How to use LibreOffice to convert document formats under CentOS

Project requirements require some preprocessing o...

Examples of using temporary tables in MySQL

I've been a little busy these two days, and t...

nginx+tomcat example of accessing the project through the domain name

I was curious about how to access the project usi...

MySQL service and database management

Table of contents 1. Start and stop service instr...

JS realizes automatic playback of timeline

Recently, I have implemented such an effect: clic...

MySQL table name case selection

Table of contents 1. Parameters that determine ca...

How to implement insert if none and update if yes in MySql

summary In some scenarios, there may be such a re...

Basic learning tutorial of table tag in HTML

Table label composition The table in HTML is comp...

Nginx domain name SSL certificate configuration (website http upgraded to https)

Preface HTTP and HTTPS In our daily life, common ...

Linux directory switching implementation code example

Switching files is a common operation in Linux. W...