MySQL Quick Data Comparison Techniques

MySQL Quick Data Comparison Techniques

In MySQL operation and maintenance, a R&D colleague wants to compare the data on two different instances and find the differences. In addition to the primary key, every field needs to be compared. How to do it?

The first solution is to write a program to extract each row of data from the two instances for comparison. This is theoretically feasible, but the comparison time is long.

The second solution is to merge all the fields in each row of data, take the checksum value, and then compare according to the checksum value. This seems feasible, so give it a try.

First, we need to merge the values ​​of all fields and use the CONCAT function provided by MySQL. If the CONCAT function contains NULL values, the final result will be NULL. Therefore, we need to use the IFNULL function to replace the NULL values, such as:

CONCAT(IFNULL(C1,''),IFNULL(C2,''))

The table you are joining has many rows, and it is tiring to manually compose a script. Don't worry, you can use information_schema.COLUMNS to handle it:

## Get the concatenated string of column names SELECT
GROUP_CONCAT('IFNULL(',COLUMN_NAME,','''')')
FROM information_schema.COLUMNS 
WHERE TABLE_NAME='table_name';

Assume we have test table:

CREATE TABLE t_test01
(
 id INT AUTO_INCREMENT PRIMARY KEY,
 C1 INT,
 C2 INT
)

We can then splice out the following SQL:

SELECT
id,
MD5(CONCAT(
IFNULL(id,''),
IFNULL(c1,''),
IFNULL(c2,''),
)) AS md5_value
FROM t_test01

Execute it on two instances, and then compare the results using Beyond Compare. It is easy to find the different rows and primary key IDs.

For tables with large amounts of data, the result set is also large and difficult to compare. So first try to reduce the result set. You can combine the md5 values ​​of multiple rows to calculate the MD5 value. If the final MD5 values ​​are the same, then these rows are the same. If they are different, it proves that there are differences. Then compare these rows line by line.

Assume that we compare in groups of 1,000 rows. If we need to merge the grouped results, we need to use the GROUP_CONCAT function. Note that we need to add sorting in the GROUP_CONCAT function to ensure the order of the merged data. The SQL is as follows:

SELECT
min(id) as min_id,
max(id) as max_id,
count(1) as row_count,
MD5(GROUP_CONCAT(
MD5(CONCAT(
IFNULL(id,''),
IFNULL(c1,''),
IFNULL(c2,''),
)) ORDER BY id
))AS md5_value
FROM t_test01
GROUP BY (id div 1000)

The execution results are:

min_id max_id row_count md5_value
0 999 1000 7d49def23611f610849ef559677fec0c
1000 1999 1000 95d61931aa5d3b48f1e38b3550daee08
2000 2999 1000 b02612548fae8a4455418365b3ae611a
3000 3999 1000 fe798602ab9dd1c69b36a0da568b6dbb

When there are fewer different data, even if we need to compare tens of millions of data, we can easily locate the 1,000 data with differences based on min_id and max_id, and then compare the MD5 values ​​line by line to finally find the different rows.

Final comparison chart:

PS:

When using GROUP_CONCAT, you need to configure the MySQL variable group_concat_max_len. The default value is 1024, and the excess will be staged.

You may also be interested in:
  • MySQL 5.7.20 common download, installation and configuration methods and simple operation skills (decompression version free installation)
  • How to use Java Web to connect to MySQL database
  • How to use tcpdump to capture packets for mysql
  • Detailed explanation of 30 SQL query optimization techniques for MySQL tens of millions of large data
  • Mysql optimization techniques for querying dates based on time
  • 10 SQL statement optimization techniques to improve MYSQL query efficiency
  • Summary of common problems and application skills in MySQL
  • 5 Tips for Protecting Your MySQL Data Warehouse
  • Share 101 MySQL debugging and optimization tips
  • MySql Sql optimization tips sharing
  • Summary of MySQL injection bypass filtering techniques
  • Summary of common operation skills of MySQL database

<<:  How to implement nginx smooth restart

>>:  JS Decorator Pattern and TypeScript Decorators

Recommend

How to draw a mind map in a mini program

Table of contents What is a mind map? How to draw...

MySQL 5.7.10 installation and configuration tutorial under Windows

MySQL provides two different versions for differe...

Comparison of the efficiency of different methods of deleting files in Linux

Test the efficiency of deleting a large number of...

How to block and prohibit web crawlers in Nginx server

Every website usually encounters many non-search ...

Various types of MySQL indexes

What is an index? An index is a data structure th...

Comparison between Redis and Memcache and how to choose

I've been using redis recently and I find it ...

mysql 5.7.23 winx64 decompression version installation tutorial

Detailed installation tutorial of mysql-5.7.23-wi...

Tutorial on installing PHP on centos via yum

First, let me introduce how to install PHP on Cen...

Let's talk about the storage engine in MySQL

Basics In a relational database, each data table ...

Summary of several commonly used CentOS7 images based on Docker

Table of contents 1 Install Docker 2 Configuring ...

Linux platform mysql enable remote login

During the development process, I often encounter...

Solution to changing the data storage location of the database in MySQL 5.7

As the data stored in the MySQL database graduall...

CSS3 speeds up and delays transitions

1. Use the speed control function to control the ...

Detailed tutorial on installing and configuring MySql5.7 on Ubuntu 20.04

Table of contents 1. Ubuntu source change 2. Inst...

Completely uninstall MySQL database in Windows system to reinstall MySQL

1. In the control panel, uninstall all components...