MySQL group query optimization method

MySQL handles GROUP BY and DISTINCT queries similarly in most cases; in fact, it sometimes switches between the two during optimization. Both types of queries can benefit from indexing, and this is usually the most important way to optimize them.

When an index cannot be used, MySQL has two strategies for GROUP BY queries: using a temporary table or filesort to perform grouping. For a given query, neither approach can be more efficient. We can configure SQL_BIG_RESULT and SQL_SMALL_RESULT to specify the optimizer to choose one of these methods.

Usually, it is more efficient to group the query table by id rather than by value. For example, the following query is inefficient:

SELECT actor.first_name, actor.last_name, COUNT(*)
FROM sakila.file_actor
INNER JOIN sakila.actor USING(actor_id)
GROUP BY actor.first_name, actor.last_name;

The following query is more efficient:

SELECT actor.first_name, actor.last_name, COUNT(*)
FROM sakila.file_actor
INNER JOIN sakila.actor USING(actor_id)
GROUP BY film_actor.actor_id;

Using actor.actor_id for grouping is more efficient than film_actor.actor_id.

This query works to its advantage because the actor's name is dependent on the actor_id, so it will return the same results, but it would not do so if the results were different. Sometimes the server even disables GROUP BY via SQL_MODE configuration. If you do not care about the obtained value at this time, and the value of the column used for grouping is unique, you can use MIN and MAX to solve this problem.

SELECT MIN(actor.first_name), MAX(actor.last_name), ...;

For perfectionists, they will think that your grouping is wrong and they are right. A consequence of a dummy MIN or MAX is that the query will not assemble correctly. However, sometimes you just want MySQL to execute queries faster. A perfectionist would be satisfied with the following query:

SELECT actor.first_name, actor.last_name, c.cnt
FROM sakila.actor
	INNER JOIN (
    SELECT actor_id, COUNT(*) AS cnt
    FROM sakila.film_actor
    GROUP BY actor_id
  ) AS c USING(actor_id);

However, creating and populating a temporary table in a subquery might be more expensive than it seems in theory. It is important to remember that the temporary table constructed by the subquery has no index, which will result in a performance degradation.

Generally in grouped queries, selecting columns that are not grouped is a bad idea. This is because the query results are uncertain. Once the index is changed or the optimizer uses a different strategy, the results will be changed. In fact, we recommend setting the server's SQL_MODE to ONLY_FULL_GROUP_BY, so that when a badly written group by query is written, the system will generate an error instead of executing it directly. After ONLY_FULL_GROUP_BY is enabled, the fields of SELECT can only be the fields specified by GROUP BY. In this case, you can construct a step-by-step query or a subquery to first find the grouped columns and then perform a secondary query.

MySQL will automatically group according to the order of the columns specified in GROUP BY, unless ORDER BY is used to specify a sorting rule. If you don't care about the order and find that this causes a filesort, you can use ORDER BY NULL to skip the automatic sort. You can also specify that the results should be sorted in a specified direction by adding DESC or ASC after GROUP BY.

Sometimes you can ask MySQL to do a super-aggregation in the results when grouping queries. This can be done by adding the WITH ROLLUP clause after the GROUP BY, but this may not achieve the desired optimization. You can check the execution method through EXPLAIN and pay attention to whether the grouping is done through filesort or temporary tables. Then compare the same queries after removing WITH ROLLUP. By comparing, we may be able to find ways to optimize.

Sometimes it is more efficient to add an aggregate query, even though this method will return more rows. You can also nest subqueries after FROM to keep intermediate query results, and then use UNION to obtain the final result.

But please note that it is best to remove WITH ROLLUP from your application and perform grouped queries through optimization.

Conclusion: When using GROUP BY for grouped queries, it is best to use index columns for grouping. If you do not need to specify the order, you can use ORDER BY NULL for optimization. If you are not grouping by index columns, you need to consider alternatives and whether to use subqueries or use WITH ROLLUP to check performance before optimizing. At the same time, in order to prevent unexpected errors in group queries, it is best to enable ONLY_FULL_GROUP_BY.

The above is the details of the optimization method of MySQL group query. For more information about the optimization of MySQL group query, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:

MySQL index failure principle
The principles and defects of MySQL full-text indexing
MySQL index principle and usage example analysis
Understanding the MySQL query optimization process
MySQL paging query optimization techniques
MySQL index principle and query optimization detailed explanation

<<: Implementing search box function with search icon based on html css

>>: How to add configuration options to Discuz! Forum

MySQL group query optimization method

How to create a view in MySQL

JavaScript clicks the button to generate a 4-digit random verification code

How to start a transaction in MySQL

How to modify the previous command when an input error occurs in the MySQL command prompt

Docker builds kubectl image implementation steps

Shell script settings to prevent brute force ssh

Native js to realize bouncing ball

How to create https using nginx and Tencent Cloud free certificate

One line of code teaches you how to hide Linux processes

Docker case analysis: Building a Redis service

Recommend

CSS solution for centering elements with variable width and height

Detailed explanation of docker nginx container startup and mounting to local

Let's talk about my understanding and application of React Context

When setting up Jenkins in Docker environment, the console log shows garbled Chinese characters when building tasks

How to decrypt Linux version information

The question of whether a tag opens a new page. The opening status of major websites is summarized

Why MySQL should avoid large transactions and how to solve them

MySQL log trigger implementation code

How to build LNMP environment on Ubuntu 20.04

Analysis of MySQL multi-table joint query operation examples

Example code for changing the style of other tags by hovering the mouse using CSS

Implement MaterialUI button click animation based on CSS and encapsulate it into a React component

Detailed explanation of monitoring NVIDIA GPU usage under Linux

How to use libudev in Linux to get USB device VID and PID

How to implement logic reuse with Vue3 composition API