MySQL index for beginners

Preface

Since the most important data structure in MySQL index is B+ tree, let's first briefly talk about the principle of B+ tree.

B+ Tree Principle

1. Data Structure

B Tree refers to Balance Tree, which is a balanced tree. A balanced tree is a search tree where all leaf nodes are located at the same level.

B+ Tree is implemented based on B Tree and leaf node sequential access pointers. It has the balance of B Tree and improves the performance of interval query through sequential access pointers.

In a B+ Tree, the keys in a node are arranged non-decreasingly from left to right. If the left and right adjacent keys of a pointer are keyi and keyi+1, and they are not null, then all the keys of the node pointed to by the pointer are greater than or equal to keyi and less than or equal to keyi+1.

2. Operation

When performing a search operation, first perform a binary search on the root node to find a pointer to a key, and then recursively search the node pointed to by the pointer. Until a leaf node is found, then a binary search is performed on the leaf node to find the data corresponding to the key.

Insertion and deletion operations will destroy the balance of the balanced tree. Therefore, after the insertion and deletion operations, the tree needs to be split, merged, rotated, etc. to maintain the balance.

3. Comparison with Red-Black Tree

Balanced trees such as red-black trees can also be used to implement indexes, but file systems and database systems generally use B+ Tree as the index structure, mainly for the following two reasons:

1. Fewer searches

The time complexity of a balanced tree search operation is equal to the tree height h, which is roughly O(h)=O(logdN), where d is the out-degree of each node.

The out-degree of the red-black tree is 2, while the out-degree of the B+ Tree is generally very large, so the tree height h of the red-black tree is obviously much larger than that of the B+ Tree, and the number of searches is also greater.

(II) Using the disk pre-reading feature

In order to reduce disk I/O, the disk is often not read strictly on demand, but reads in advance every time. During the pre-read process, the disk performs sequential reading. Sequential reading does not require disk seek and only requires a short rotation time, so the speed is very fast.

The operating system generally divides memory and disk into solid-size blocks, each of which is called a page, and memory and disk exchange data in units of pages. The database system sets the size of an index node to the size of a page so that a node can be fully loaded in one I/O. And by utilizing the pre-reading feature, adjacent nodes can also be pre-loaded.

MySQL Indexes

Indexes are implemented at the storage engine level, not at the server level, so different storage engines have different index types and implementations.

1. B+Tree Index

Is the default index type for most MySQL storage engines.

Since there is no need to scan the entire table, only the tree needs to be searched, so the search speed is much faster.

In addition to searching, it can also be used for sorting and grouping.

You can specify multiple columns as index columns, and multiple index columns together form the key.

Applicable to full key value, key value range and key prefix search, among which key prefix search is only applicable to the leftmost prefix search. If the search is not in the order of the index columns, the index cannot be used.

InnoDB's B+Tree index is divided into primary index and auxiliary index. The data field of the leaf node of the primary index records the complete data record. This indexing method is called a clustered index. Because it is impossible to store data rows in two different places, a table can have only one clustered index.

The data field of the leaf node of the auxiliary index records the value of the primary key. Therefore, when using the auxiliary index for search, you need to first find the primary key value and then search in the primary index.

2. Hash Index

Hash indexes can be searched in O(1) time, but they lose orderliness: they cannot be used for sorting and grouping; they only support exact searches, and cannot be used for partial searches or range searches. The InnoDB storage engine has a special feature called "adaptive hash index". When an index value is used very frequently, a hash index will be created on top of the B+Tree index. This gives the B+Tree index some of the advantages of the hash index, such as fast hash search.

3. Full-text indexing

The MyISAM storage engine supports full-text indexing, which is used to find keywords in text instead of directly comparing them for equality.

The search condition uses MATCH AGAINST instead of the normal WHERE.

The full-text index is implemented using an inverted index, which records the mapping of keywords to the documents in which they are located.

The InnoDB storage engine also began to support full-text indexing in MySQL version 5.6.4.

4. Spatial Data Index

The MyISAM storage engine supports spatial data indexes (R-Tree) and can be used for geographic data storage. Spatial data indexes index data from all dimensions and can effectively use any dimension for combined queries. GIS related functions must be used to maintain data.

Index optimization

1. Independent columns

When performing a query, the index column cannot be part of an expression or a function parameter, otherwise the index cannot be used. For example, the following query cannot use the index on the actor_id column:

SELECT actor_id FROM sakila.actor WHERE actor_id + 1 = 5;

2. Multi-column index

When you need to use multiple columns as conditions for a query, using a multi-column index will provide better performance than using multiple single-column indexes. For example, in the following statement, it is best to set actor_id and film_id as multi-column indexes.

SELECT film_id, actor_ id FROM sakila.film_actor WHERE actor_id = 1 AND film_id = 1;

3. Order of index columns

Put the most selective index columns first.

The selectivity of an index refers to the ratio of unique index values to the total number of records. The maximum value is 1, in which case each record has a unique index corresponding to it. The higher the selectivity, the more efficient the query.

For example, in the results shown below, customer_id has a higher selectivity than staff_id, so it is best to put the customer_id column at the front of the multi-column index.

SELECT COUNT(DISTINCT staff_id)/COUNT(*) AS staff_id_selectivity,
COUNT(DISTINCT customer_id)/COUNT(*) AS customer_id_selectivity,
COUNT(*)
FROM payment;

staff_id_selectivity: 0.0001
customer_id_selectivity: 0.0373
 COUNT(*): 16049

4. Prefix Index

For columns of BLOB, TEXT, and VARCHAR types, you must use a prefix index to index only the beginning characters.

The selection of prefix length needs to be determined based on index selectivity.

5. Covering Index

The index contains the values of all the fields that need to be queried.

It has the following advantages:

Indexes are usually much smaller than the size of data rows, and reading only the index can greatly reduce the amount of data access.
Some storage engines (such as MyISAM) cache only indexes in memory and rely on the operating system to cache the data. Therefore, accessing only the index can be done without using system calls (which are usually time-consuming).
For the InnoDB engine, if the secondary index can cover the query, there is no need to access the primary index.

6. Leftmost Prefix Principle

As the name implies, the leftmost is first, and any consecutive index can be matched starting from the leftmost.

The essence of joint index:

When you create a (a,b,c) joint index, it is equivalent to creating a (a) single-column index. If you want the (a,b) joint index and the (a,b,c) joint index to take effect, you can only use the three combinations of a and a,b and a,b,c.

Advantages of Indexes

This greatly reduces the number of data rows that the server needs to scan.
Helps the server avoid sorting and grouping, and avoids creating temporary tables (B+Tree indexes are ordered and can be used for ORDER BY and GROUP BY operations. Temporary tables are mainly created during sorting and grouping. Since sorting and grouping are not required, there is no need to create temporary tables).
Convert random I/O to sequential I/O (B+Tree index is ordered and stores adjacent data together).

Conditions for using indexes

For very small tables, a simple full table scan is more efficient than indexing in most cases;
For medium to large tables, indexes are very effective;
However, for very large tables, the cost of creating and maintaining indexes will increase accordingly. In this case, a technology is needed to directly distinguish a set of data that needs to be queried, rather than matching one record at a time. For example, partitioning technology can be used.

summary

Index is a very important function in MySQL. If you can make good use of indexes in daily development, you can greatly improve the execution performance of SQL statements, so it is very necessary to understand the principles behind it.

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in:

Solve MySQL deadlock routine by updating different indexes
Understanding MySQL deadlock routines through unique index S lock and X lock
Share some key interview questions about MySQL index
Index in MySQL
A brief talk about Mysql index and redis jump table
MySQL Learning (VII): Detailed Explanation of the Implementation Principle of Innodb Storage Engine Index
How to add index to mysql using shell script
Solutions to MySQL batch insert and unique index problems
Guide to Efficient Use of MySQL Indexes

<<: 5 Ways to Send Emails in Linux Command Line (Recommended)

>>: vue $set implements assignment of values to array collection objects