Preface: The reason for working overtime is to go online and solve the problem of duplicate data in the online database. There are duplicate data in 6 tables in the online library, 2 of which are relatively large, one with more than 960,000 and the other with more than 300,000. Because I have dealt with the same problem before, I directly used the Emmmm, but the efficiency is too low, one piece per second, about 20,000+ duplicate data, and the estimated time is about 8 hours. . . There is a problem in blindly relying on the things of predecessors without thinking for yourself! Always wondering why it worked before and why it doesn’t work now is also a problem! I found that I have been in a bad state recently and have lost the desire to explore and seek knowledge. Today is a wake-up call for me and I feel like I have finally found my way back to the right path. Now, let's get back to the topic. The deduplication steps are introduced in detail below. 1. Discover the problemCREATE TABLE `animal` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(20) DEFAULT NULL, `age` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin; INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('1', 'cat', '12'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('2', 'dog', '13'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('3', 'camel', '25'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('4', 'cat', '32'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('5', 'dog', '42'); Goal: We want to remove data with the same First, see which data is repeated. SELECT name,count( 1 ) FROM student GROUP BY NAME HAVING count( 1 ) > 1; Output:
The data Select * From Table Where Repeating Field In (Select Repeating Field From Table Group By Repeating Field Having Count(1)>1) 2. Delete all duplicate data without leaving anyDirect deletion will result in an error. DELETE FROM student WHERE NAME IN ( SELECT NAME FROM student GROUP BY NAME HAVING count( 1 ) > 1) Error: 1093 - You can't specify target table 'student' for update in FROM clause, Time: 0.016000s The reason is: while updating this table, this table is queried, while querying this table, this table is updated again, which can be understood as a deadlock. MySQL does not support this operation of updating and querying the same table. Solution: Query the columns of data to be updated as a third-party table, and then filter and update them. DELETE FROM student WHERE NAME IN ( SELECT t.NAME FROM ( SELECT NAME FROM student GROUP BY NAME HAVING count( 1 ) > 1 ) t) 3. Delete duplicate data from the delete table and keep only oneBefore deleting, we can check what kind of duplicate data we want to delete. SELECT * FROM student WHERE id NOT IN ( SELECT t.id FROM ( SELECT MIN( id ) AS id FROM student GROUP BY `name` ) t ) What does this mean? First, we group by 4. Start deleting duplicate data and keep only one It's very simple. Just replace DELETE FROM student WHERE id NOT IN ( SELECT t.id FROM ( SELECT MIN( id ) AS id FROM student GROUP BY `name` ) t ) The execution of a table with 900,000+ records is super fast. This is the end of this article about how to filter out duplicate data during You may also be interested in:
|
>>: base target="" controls the link's target open frame
Table of contents 1. Project Environment 2. Proje...
Problem Description html <iframe id="h5Co...
<br />For some time, I found that many peopl...
Before reading this article, I hope you have a pr...
Introduction to IPSec IPSec (Internet Protocol Se...
1. Check whether port 80 is occupied. Generally, ...
About password strength verification: [root@mysql...
Table of contents 1. Let’s start with the conclus...
Using NULL in comparison operators mysql> sele...
Table of contents 1. Virtual Host 1.1 Virtual Hos...
This tutorial shares the installation of mysql in...
The most important logs in the MySQL log system a...
Today, let's talk about a situation that is o...
Introduction MySQL 5.7 aims to be the most secure...
Operating system: Window10 MySQL version: 8.0.13-...