MySQL group query optimization method

MySQL group query optimization method

MySQL handles GROUP BY and DISTINCT queries similarly in most cases; in fact, it sometimes switches between the two during optimization. Both types of queries can benefit from indexing, and this is usually the most important way to optimize them.

When an index cannot be used, MySQL has two strategies for GROUP BY queries: using a temporary table or filesort to perform grouping. For a given query, neither approach can be more efficient. We can configure SQL_BIG_RESULT and SQL_SMALL_RESULT to specify the optimizer to choose one of these methods.

Usually, it is more efficient to group the query table by id rather than by value. For example, the following query is inefficient:

SELECT actor.first_name, actor.last_name, COUNT(*)
FROM sakila.file_actor
INNER JOIN sakila.actor USING(actor_id)
GROUP BY actor.first_name, actor.last_name;

The following query is more efficient:

SELECT actor.first_name, actor.last_name, COUNT(*)
FROM sakila.file_actor
INNER JOIN sakila.actor USING(actor_id)
GROUP BY film_actor.actor_id;

Using actor.actor_id for grouping is more efficient than film_actor.actor_id.

This query works to its advantage because the actor's name is dependent on the actor_id, so it will return the same results, but it would not do so if the results were different. Sometimes the server even disables GROUP BY via SQL_MODE configuration. If you do not care about the obtained value at this time, and the value of the column used for grouping is unique, you can use MIN and MAX to solve this problem.

SELECT MIN(actor.first_name), MAX(actor.last_name), ...;

For perfectionists, they will think that your grouping is wrong and they are right. A consequence of a dummy MIN or MAX is that the query will not assemble correctly. However, sometimes you just want MySQL to execute queries faster. A perfectionist would be satisfied with the following query:

SELECT actor.first_name, actor.last_name, c.cnt
FROM sakila.actor
	INNER JOIN (
    SELECT actor_id, COUNT(*) AS cnt
    FROM sakila.film_actor
    GROUP BY actor_id
  ) AS c USING(actor_id);

However, creating and populating a temporary table in a subquery might be more expensive than it seems in theory. It is important to remember that the temporary table constructed by the subquery has no index, which will result in a performance degradation.

Generally in grouped queries, selecting columns that are not grouped is a bad idea. This is because the query results are uncertain. Once the index is changed or the optimizer uses a different strategy, the results will be changed. In fact, we recommend setting the server's SQL_MODE to ONLY_FULL_GROUP_BY, so that when a badly written group by query is written, the system will generate an error instead of executing it directly. After ONLY_FULL_GROUP_BY is enabled, the fields of SELECT can only be the fields specified by GROUP BY. In this case, you can construct a step-by-step query or a subquery to first find the grouped columns and then perform a secondary query.

MySQL will automatically group according to the order of the columns specified in GROUP BY, unless ORDER BY is used to specify a sorting rule. If you don't care about the order and find that this causes a filesort, you can use ORDER BY NULL to skip the automatic sort. You can also specify that the results should be sorted in a specified direction by adding DESC or ASC after GROUP BY.

Sometimes you can ask MySQL to do a super-aggregation in the results when grouping queries. This can be done by adding the WITH ROLLUP clause after the GROUP BY, but this may not achieve the desired optimization. You can check the execution method through EXPLAIN and pay attention to whether the grouping is done through filesort or temporary tables. Then compare the same queries after removing WITH ROLLUP. By comparing, we may be able to find ways to optimize.

Sometimes it is more efficient to add an aggregate query, even though this method will return more rows. You can also nest subqueries after FROM to keep intermediate query results, and then use UNION to obtain the final result.

But please note that it is best to remove WITH ROLLUP from your application and perform grouped queries through optimization.

Conclusion: When using GROUP BY for grouped queries, it is best to use index columns for grouping. If you do not need to specify the order, you can use ORDER BY NULL for optimization. If you are not grouping by index columns, you need to consider alternatives and whether to use subqueries or use WITH ROLLUP to check performance before optimizing. At the same time, in order to prevent unexpected errors in group queries, it is best to enable ONLY_FULL_GROUP_BY.

The above is the details of the optimization method of MySQL group query. For more information about the optimization of MySQL group query, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • MySQL index failure principle
  • The principles and defects of MySQL full-text indexing
  • MySQL index principle and usage example analysis
  • Understanding the MySQL query optimization process
  • MySQL paging query optimization techniques
  • MySQL index principle and query optimization detailed explanation

<<:  Implementing search box function with search icon based on html css

>>:  How to add configuration options to Discuz! Forum

Recommend

CSS solution for centering elements with variable width and height

1. Horizontal center Public code: html: <div c...

Detailed explanation of docker nginx container startup and mounting to local

First, the structure inside the nginx container: ...

Let's talk about my understanding and application of React Context

Table of contents Preface First look at React Con...

How to decrypt Linux version information

Displaying and interpreting information about you...

Why MySQL should avoid large transactions and how to solve them

What is a big deal? Transactions that run for a l...

MySQL log trigger implementation code

SQL statement DROP TRIGGER IF EXISTS sys_menu_edi...

How to build LNMP environment on Ubuntu 20.04

Simple description Since it was built with Centos...

Analysis of MySQL multi-table joint query operation examples

This article describes the MySQL multi-table join...

Detailed explanation of monitoring NVIDIA GPU usage under Linux

When using TensorFlow for deep learning, insuffic...

How to use libudev in Linux to get USB device VID and PID

In this article, we will use the libudev library ...

How to implement logic reuse with Vue3 composition API

Composition API implements logic reuse steps: Ext...