Should I use distinct or group by to remove duplicates in MySQL?

Should I use distinct or group by to remove duplicates in MySQL? Performance ratio Small quantity, few types Small quantity, many varieties Large number of categoriesNo indexingSlightly betterDistinct is betterWith indexingSlightly differentSlightly differentSlightly differentSlightly differentSlightly different

In the deduplication scenario, when no index is added, distinct is more likely to be used, but when index is added, both distinct and group by can be used.

Summarize

This is the article about whether to use distinct or group by for MySQL deduplication? This is the end of the article. For more information about mysql deduplication distinct group by, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • A brief discussion on MySQL select optimization solution
  • MySQL select results to perform update example tutorial
  • Solve the problem that MySQL read-write separation causes data not to be selected after insert
  • How MySQL Select Statement is Executed
  • Detailed example of using the distinct method in MySQL
  • The difference between distinct and group by in MySQL
  • Let's talk about the LIMIT statement in MySQL in detail
  • MySQL series tutorial on understanding the use of union (all) and limit and exists keywords
  • The impact of limit on query performance in MySQL
  • Use of select, distinct, and limit in MySQL

Preface

About the performance comparison between group by and distinct: the conclusion on the Internet is as follows: distinct has better performance with a small amount of data without index, and group by has better performance with a large amount of data. Group by with index has better performance. When going through the index, the fewer the grouping types, the faster distinct is. Verify the conclusions drawn online.

Disable query cache during the prepare phase

Check whether query cache is set in MySQL. In order not to affect the test results, you need to turn off the query cache.

show variables like '%query_cache%'; 

insert image description here

Check whether query cache is enabled or not, which is determined by query_cache_type and query_cache_size .

  • Method 1: To turn off query cache, you need to find my.ini and modify query_cache_type You need to modify the C:\ProgramData\MySQL\MySQL Server 5.7\my.ini configuration file and modify query_cache_type=0或2 .
  • Method 2: Set query_cache_size to 0 and execute the following statement.
set global query_cache_size = 0;

Method 3: If you don’t want to turn off the query cache, you can also use RESET QUERY CACHE .

In the current test environment, query_cache_type=2 means query caching on demand. The default query mode is not to cache. If caching is required, you need to add sql_cache to the query statement.

Data preparation

Table t0 stores 100,000少量種類少

drop table if exists t0;
create table t0(
id bigint primary key auto_increment,
a varchar(255) not null
) engine=InnoDB default charset=utf8mb4 collate=utf8mb4_bin;
1
2
3
4
5
drop procedure insert_t0_simple_category_data_sp;
delimiter //
create procedure insert_t0_simple_category_data_sp(IN num int)
begin
set @i = 0;
while @i < num do
	insert into t0(a) value(truncate(@i/1000, 0));
 set @i = @i + 1;
end while;
end
//
call insert_t0_simple_category_data_sp(100000);

Table t1 stores 10,000少量種類多

drop table if exists t1;
create table t1 like t0;
1
2
drop procedure insert_t1_complex_category_data_sp;
delimiter //
create procedure insert_t1_complex_category_data_sp(IN num int)
begin
set @i = 0;
while @i < num do
	insert into t1(a) value(truncate(@i/10, 0));
 set @i = @i + 1;
end while;
end
//
call insert_t1_complex_category_data_sp(10000);

The t2 table stores 5 million大量種類多

drop table if exists t2;
create table t2 like t1;
1
2
drop procedure insert_t2_complex_category_data_sp;
delimiter //
create procedure insert_t2_complex_category_data_sp(IN num int)
begin
set @i = 0;
while @i < num do
	insert into t1(a) value(truncate(@i/10, 0));
 set @i = @i + 1;
end while;
end
//
call insert_t2_complex_category_data_sp(5000000);

Testing Phase

Verify a small amount of data

Not indexed

set profiling = 1;
select distinct a from t0;
show profiles;
select a from t0 group by a;
show profiles;
alter table t0 add index `a_t0_index`(a); 

insert image description here

This shows that when there is a small number of types and little data, without indexing, the performance of distinct and group by is almost the same.

Add index

alter table t0 add index `a_t0_index`(a);

After executing a query similar to the above

insert image description here

This shows that with a small number of types and little data, the performance of distinct and group by are almost the same when adding indexes.

Verify that a small amount of data with many types is not indexed

After executing a similar unindexed query as above

insert image description here

It can be seen from this that when there is a small amount of data with many types and no index, the performance of distinct is slightly higher than that of group by, but the difference is not large.

Add index

alter table t1 add index `a_t1_index`(a);

After executing a similar unindexed query

insert image description here

It can be seen from this that with a small amount of data and a lot of types, the performance of distinct and group by are almost the same when adding indexes.

Verify large amounts of data

Not indexed

SELECT count(1) FROM t2; 

insert image description here

After executing a similar unindexed query as above

insert image description here

This shows that when there is a large amount of data of many types and without indexing, DISTINCT performs better than GROUP BY.

Add index

alter table t2 add index `a_t2_index`(a);

After executing the above similar index query

insert image description here

This shows that with a large amount of data of many types, the performance of distinct and group by are almost the same when adding indexes.

Summarize

<<:  Example of Vue uploading files using formData format type

>>:  How to deploy Tencent Cloud Server from scratch

Recommend

How to write transparent CSS for images using filters

How to write transparent CSS for images using filt...

Javascript destructuring assignment details

Table of contents 1. Array deconstruction 2. Obje...

Application of Beautiful Style Sheets in XHTML+CSS Web Page Creation

This is an article written a long time ago. Now it...

HTML 5 Reset Stylesheet

This CSS reset is modified based on Eric Meyers...

Four practical tips for JavaScript string operations

Table of contents Preface 1. Split a string 2. JS...

About the problem of offline installation of Docker package on CentOS 8.4

The virtual machine used is CentOS 8.4, which sim...

Flash embedded in web pages and IE, FF, Maxthon compatibility issues

After going through a lot of hardships, I searched...

Simple implementation of html hiding scroll bar

1. HTML tags with attributes XML/HTML CodeCopy co...

Tomcat8 uses cronolog to split Catalina.Out logs

background If the catalina.out log file generated...

Security configuration and detection of SSL after the website enables https

It is standard for websites to enable SSL nowaday...

How to prevent duplicate submission in jquery project

In new projects, axios can prevent duplicate subm...

Samba server configuration under Centos7 (actual combat)

Samba Overview Samba is a free software that impl...

Example code for implementing bottom alignment in multiple ways with CSS

Due to the company's business requirements, t...

Pure CSS to achieve cool charging animation

Let’s take a look at what kind of charging animat...