One sql statement completes MySQL deduplication and keeps one

One sql statement completes MySQL deduplication and keeps one

A few days ago, when I was working on a requirement, I needed to clean up duplicate records in MySQL. At that time, I thought of writing it out through code traversal, but then I felt it was too complicated, and I thought that the problem should be solved through a SQL statement. After checking the information and consulting the experts, I came up with a very convenient SQL statement. Here I share this SQL statement and ideas.

Demand Analysis

There are duplicate records in the database, delete one and keep one (the basis for determining whether it is a duplicate is multiple fields)

Solution

When I encountered this requirement, I probably had some ideas in mind. The first thing that comes to my mind is that it can be solved with a SQL statement, but I am not very experienced in complex SQL statements, so I want to ask an expert for help.

Find someone to help

Because this requirement is a bit urgent, the first thing I thought of was to find a colleague in this field to solve it, and then share this problem with my colleague. As a result, this guy just Baidu and gave me a SQL statement that I had never used before, and asked me to try it myself. A lot of things rushed through my heart...

Baidu

Found a sql statement:

DELETE
FROM
 vitamin a
WHERE
 (a.peopleId, a.seq) IN (
  SELECT
   peopleId,
   seq
  FROM
   vitae
  GROUP BY
   peopleId,
   seq
  HAVING
   count(*) > 1
 )
AND rowid NOT IN (
 SELECT
  min(rowid)
 FROM
  vitae
 GROUP BY
  peopleId,
  seq
 HAVING
  count(*) > 1
)

This statement is found in the article [Delete duplicate data in MySQL and keep only one]. The idea of ​​this SQL statement is very clear, and there are three steps:

SELECT peopleId, seq FROM vitae GROUP BY peopleId, seq HAVING count(*) > 1 to find duplicate records in the table as a condition

SELECT min(rowid) FROM vitae GROUP BY peopleId, seq HAVING count(*) > 1 Query the smallest ID value in the duplicate records in the table as the second condition

Finally, according to the above two conditions, delete the remaining duplicate records except the smallest ID in the duplicate records

But unfortunately, an error occurred when running this statement. The general meaning of the error is that the table cannot be updated at the same time as the query.

Code Solution

Based on the above SQL statement, it is possible to achieve the same purpose in two steps through code:

Remove the duplicate data sets first

According to the queried data set, loop to delete the remaining duplicate data

I had the idea and wrote it quickly, but I was shocked when I ran it. It took about 116 seconds . Then I thought I must find a SQL statement that can be used. I posted the code and the running results:

Perfect [Remove Duplicates and Keep One] SQL

Finally, I got the perfect answer in a technical group. Look at this SQL statement:

DELETE consum_record
FROM
 consum_record, 
 (
  SELECT
   min(id) id,
   user_id,
   monetary,
   consume_time
  FROM
   consum_record
  GROUP BY
   user_id,
   monetary,
   consume_time
  HAVING
   count(*) > 1
 ) t2
WHERE
 consum_record.user_id = t2.user_id 
 and consum_record.monetary = t2.monetary
 and consum_record.consume_time = t2.consume_time
AND consum_record.id > t2.id;

If you look carefully at the above SQL statement, it is not difficult to figure out the idea. It can be understood in three steps:

(SELECT min(id) id, user_id, monetary, consume_time FROM consum_record GROUP BY user_id, monetary, consume_time HAVING count(*) > 1 ) t2 (temporary table t2), which contains the minimum ID of each duplicate record

consum_record.user_id = t2.user_id and consum_record.monetary = t2.monetary and consum_record.consume_time = t2.consume_time are the fields that are used to determine the duplication basis.

According to the conditions, delete the records in the original table whose id is greater than the id in t2

When I saw this sentence, I thought it was amazing. Such a simple SQL statement can actually solve such a complex problem. It's really interesting!

It also runs super fast. The original code loop execution takes about 116 seconds , but here it only takes 0.3 seconds . Amazing~

Summarize

As a PHP programmer, it stands to reason that SQL should not lag behind. However, in reality, there are too many things to do and my current SQL level is only at an average level. I will find an opportunity to improve my knowledge in this area in the future.

That’s all for today.

You may also be interested in:
  • MySQL development skills: JOIN update and data duplication check/deduplication
  • MySQL data duplicate checking and deduplication implementation statements
  • MySQL deduplication methods
  • A brief discussion on deduplication in SQL database
  • A small example of SQL grouping and sorting to remove duplicates
  • Detailed explanation of two methods of deduplication in MySQL and example code
  • A practical record of how to check and remove duplicate SQL

<<:  js implements simple provincial, municipal and district three-level selection cascade

>>:  Detailed analysis of when tomcat writes back the response datagram

Recommend

Goodbye Docker: How to Transform to Containerd in 5 Minutes

Docker is a very popular container technology. Th...

Practice of realizing Echarts chart width and height adaptation in Vue

Table of contents 1. Install and import 2. Define...

Native Js implementation of calendar widget

This article example shares the specific code of ...

VMware configuration VMnet8 network method steps

Table of contents 1. Introduction 2. Configuratio...

MySQL explain obtains query instruction information principle and example

explain is used to obtain query execution plan in...

Detailed usage of Vue timer

This article example shares the specific code of ...

How to write a picture as a background and a link (background picture plus link)

The picture is used as the background and the lin...

How to use css variables in JS

How to use css variables in JS Use the :export ke...

Methods for defragmenting and reclaiming space in MySQL tables

Table of contents Causes of MySQL Table Fragmenta...

Detailed explanation of how to restore database data through MySQL binary log

Website administrators often accidentally delete ...

MySQL study notes on handling duplicate data

MySQL handles duplicate data Some MySQL tables ma...

MySQL 5.7.11 zip installation and configuration method graphic tutorial

1. Download the MySQL 5.7.11 zip installation pac...

js realizes two-way data binding (accessor monitoring)

This article example shares the specific code of ...

Sample code for configuring nginx to support https

1. Introduction Are you still leaving your websit...