MySQL deduplication methods

MySQL deduplication methods

MySQL deduplication methods

【Beginner】There are very few repeated lines

Use distinct to find them out, and then delete them manually one by one.

[Intermediate] Remove duplicates based on a single field

For example: remove duplicates from the id field

How to use: Get the value of the repeated field of id, compare the fields with different data in the rows with the same id field, and delete all repeated rows except the row with the smallest (or largest) field. Generally, the primary key is used for comparison because the value of the primary key must be unique and absolutely different.

id name

1 a

1 b

2 c

2 a

3 c

result:

id name

1 a 

2 a

operate:

delete from a_tmp 

where id in (select * from (select b.id from a_tmp b group by b.id having count(b.id) >1) bb) 

and name not in (select * from (select min(a.name) from a_tmp a GROUP BY a.id having count(a.id) >1) aa);

Notice:

The bold and green words above must be aliased and must use the select * from (……) format, otherwise an error will be reported:

[Err] 1093 - You can't specify target table 'a_tmp' for update in FROM clause

[Advanced] Remove duplicates by repetition of multiple fields

For example, to remove duplicates of the same id and name, that is, to count the same id and name as duplicate rows, and to count the same id but different names as non-duplicate rows

Usage: Similar to a single field, the primary key is generally used for comparison because the value of the primary key must be unique.

id name rowid

1a1

1 a 2

1 b 3

2 b 4

2 b 5

3 c 6

3d7

result:

id name rowid

1a1 

1 b 3

2 b 4

3 c 6

3d7

operate:

The first one:

delete from a_tmp 

where (id,name) in (select * from (select b.id,b.name from a_tmp b group by b.id,b.name having count(b.id) >1) bb) 

and rowid not in (select * from (select min(a.rowid) from a_tmp a group by a.id,a.name having count(a.id) >1) aa);

Second type:

Connect the values ​​of the id and name fields and insert them into the temporary table b_tmp. Then you can use the [Intermediate] single field judgment and deletion method.

#Insert the value of the two fields connected and the field with the unique value in the a_tmp table into the b_tmp table

insert into b_tmp 

 select concat(id,name),rowid from a_tmp;

#Find out the rows that need to be kept select id_name,max(rowid)

 from b_tmp 

 group by id_name

 having count(id_name)>1;

#Use the [Intermediate] method or stored procedure to complete the deduplication work

[Ultimate] Each row has two copies of the same data

For example:

How to use: If the data in the entire row is the same, it is impossible to use SQL statements to delete it, because there is no conditional restriction that can be used to keep one row and delete all the identical rows. There are no different fields that can be created by yourself, that is: add a field, set it to auto-increment, and set it as the primary key, it will automatically add values.

id name

1 a

1 a

1 b

1 b

2 c

2 c

3 c

3 c

result:

id name rowid

1a1

1 b 3

2 c 5

3 c 7

operate:

Add an auto-increment field and temporarily set it as the primary key.

Use the [Intermediate] and [Advanced] methods above.

Thank you for reading, I hope it can help you, thank you for your support of this site!

You may also be interested in:
  • MySQL development skills: JOIN update and data duplication check/deduplication
  • MySQL data duplicate checking and deduplication implementation statements
  • One sql statement completes MySQL deduplication and keeps one
  • A brief discussion on deduplication in SQL database
  • A small example of SQL grouping and sorting to remove duplicates
  • Detailed explanation of two methods of deduplication in MySQL and example code
  • A practical record of how to check and remove duplicate SQL

<<:  CentOS7 deployment Flask (Apache, mod_wsgi, Python36, venv)

>>:  Detailed explanation of Docker+Jenkins+Gitlab+Django application deployment practice

Recommend

Promise encapsulation wx.request method

The previous article introduced the implementatio...

Vue Element front-end application development table list display

1. List query interface effect Before introducing...

MySQL trigger detailed explanation and simple example

MySQL trigger simple example grammar CREATE TRIGG...

MySQL REVOKE to delete user permissions

In MySQL, you can use the REVOKE statement to rem...

Basic usage knowledge points of mini programs (very comprehensive, recommended!)

Table of contents What to do when registering an ...

Introduction to Nginx log management

Nginx log description Through access logs, you ca...

Detailed explanation of how to use CMD command to operate MySql database

First: Start and stop the mysql service net stop ...

Summary of various methods of implementing article dividing line styles with CSS

This article summarizes various ways to implement...

How to use Maxwell to synchronize MySQL data in real time

Table of contents About Maxwell Configuration and...

Tips for optimizing MySQL SQL statements

When faced with a SQL statement that is not optim...

Tutorial on installing and uninstalling python3 under Centos7

1. Install Python 3 1. Install dependency package...

Detailed explanation of the basic functions and usage of MySQL foreign keys

This article uses examples to illustrate the basi...

Summary of 10 common HBase operation and maintenance tools

Abstract: HBase comes with many operation and mai...

Implementation of Docker deployment of Django+Mysql+Redis+Gunicorn+Nginx

I. Introduction Docker technology is very popular...