MySQL method steps to determine whether it is a subset

MySQL method steps to determine whether it is a subset

1. Problem

The story originated from a report that queries the error and omission rate: there are two query results, one for the items that have been added to the report and the other for the items that should be added to the report.

What does it mean to be without omission? That is, all items that should be added have been added

The reporting completeness rate is the ratio of the number of complete reports to the total number of reports.

Here are two examples of reports (one with all added and one with missing parts)

First, find the first result - the items that should be added to the report

SELECT 
     r.id AS report ID, m.project_id should be added to the project FROM 
  report 
  INNER JOIN application a ON r.app_id=a.id
  INNER JOIN application_sample s ON a.id=s.app_id
  RIGHT JOIN application_sample_item si ON s.id=si.sample_id       
  RIGHT JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id;

Then, find the second result - report the items that have been added

SELECT r.id AS report_id,i.project_id AS added_project FROM report r 
RIGHT JOIN report_item i ON r.id=i.report_id
WHERE r.id IN ('44930','44927');

The above is the result set we want to compare. It is not difficult to see that report 44927 is complete, while 44930 has the same number of items, but actually adds 758 items and lacks 112 items, so it is a missing report.

2. Solution

Judging from the question, it is obviously a question of judging whether it is a subset. You can traverse the added items and the items that should be added separately. If the items that should be added can be matched in the added items, it means that the items that should be added are a subset of the added items, that is, there are no omissions.

This problem can indeed be solved by looping and comparing, but the cross join of Cartesian products in SQL often means huge overhead and slow query speed. So is there any way to avoid this problem?

Option 1:

With the help of the functions FIND_IN_SET and GROUP_CONCAT, first understand the following two functions

FIND_IN_SET(str,strlist)

  • str: the string to be queried
  • strlist: parameters are separated by English "," such as (1,2,6,8,10,22)

The FIND_IN_SET function returns the position of the string to be queried in the target string.

GROUP_CONCAT( [distinct] Field to be connected [order by sort field asc/desc ] [separator 'separator'] )

The GROUP_CONCAT() function can concatenate the values ​​of the same field of multiple records into one record and return it. The default separator is English ',' .

However, the default length of GROUP_CONCAT() is 1024

Therefore, if the length of the splicing needs to exceed 1024, it will cause incomplete truncation and the length needs to be modified.

SET GLOBAL group_concat_max_len=102400;
SET SESSION group_concat_max_len=102400;

From the above two function introductions, we find that FIND_IN_SET and GROUP_CONCAT are separated by English ',' (in bold)

Therefore, we can use GROUP_CONCAT to concatenate the items that have been added into a string, and then use FIND_IN_SET to query one by one whether the items to be added exist in the string.

1. Modify the SQL in the description of the problem and use GROUP_CONCAT to concatenate the items of the added items into a string

SELECT r.id,GROUP_CONCAT(i.project_id ORDER BY i.project_id,'') AS added project list FROM report r 
LEFT JOIN report_item i ON r.id=i.report_id
WHERE r.id IN ('44930','44927')
GROUP BY r.id; 

2. Use FIND_IN_SET to check one by one whether the items to be added exist in the string

SELECT Q.id,FIND_IN_SET(W.List of items to be added, Q.List of items already added) AS Is FROM missing 
   (
   -- Report the added projects SELECT r.id,GROUP_CONCAT(i.project_id ORDER BY i.project_id,'') AS added project list FROM report r 
      LEFT JOIN report_item i ON r.id=i.report_id
      WHERE r.id IN ('44930','44927')
      GROUP BY r.id
   )Q,
   (
   -- Report items that should be addedSELECT 
         r.id,s.app_id,m.project_id should add project list FROM 
         report 
         INNER JOIN application a ON r.app_id=a.id
         INNER JOIN application_sample s ON a.id=s.app_id
         INNER JOIN application_sample_item si ON s.id=si.sample_id       
         INNER JOIN set_project_mapping m ON si.set_id=m.set_id
      WHERE r.id IN ('44930','44927')
      ORDER BY r.id,m.project_id
   )W
   WHERE Q.id=W.id;

3. Filter out missed reports

 SELECT Q.id,CASE WHEN FIND_IN_SET(W.List of items to be added, Q.List of items already added)>0 THEN 1 ELSE 0 END AS Is FROM missing 
   (
   -- Report the added projects SELECT r.id,GROUP_CONCAT(i.project_id ORDER BY i.project_id,'') AS added project list FROM report r 
      LEFT JOIN report_item i ON r.id=i.report_id
      WHERE r.id IN ('44930','44927')
      GROUP BY r.id
   )Q,
   (
   -- Report items that should be addedSELECT 
         r.id,s.app_id,m.project_id should add project list FROM 
         report 
         INNER JOIN application a ON r.app_id=a.id
         INNER JOIN application_sample s ON a.id=s.app_id
         INNER JOIN application_sample_item si ON s.id=si.sample_id       
         INNER JOIN set_project_mapping m ON si.set_id=m.set_id
      WHERE r.id IN ('44930','44927')
      ORDER BY r.id,m.project_id
   )W
   WHERE Q.id=W.id
   GROUP BY Q.id
   HAVING COUNT(`Is it missing`)=SUM(`Is it missing`);

4. Our ultimate goal is to find the zero omission rate

 SELECT COUNT(X.id) number of reports without missing items, Y.total total number of reports, CONCAT(FORMAT(COUNT(X.id)/Y.total*100,2),'%') AS project without missing items rate FROM 
(
  SELECT Q.id,CASE WHEN FIND_IN_SET(W.List of items to be added, Q.List of items already added)>0 THEN 1 ELSE 0 END AS Is FROM missing 
   (
   -- Report the added projects SELECT r.id,GROUP_CONCAT(i.project_id ORDER BY i.project_id,'') AS added project list FROM report r 
      LEFT JOIN report_item i ON r.id=i.report_id
      WHERE r.id IN ('44930','44927')
      GROUP BY r.id
   )Q,
   (
   -- Report items that should be addedSELECT 
         r.id,s.app_id,m.project_id should add project list FROM 
         report 
         INNER JOIN application a ON r.app_id=a.id
         INNER JOIN application_sample s ON a.id=s.app_id
         INNER JOIN application_sample_item si ON s.id=si.sample_id       
         INNER JOIN set_project_mapping m ON si.set_id=m.set_id
       WHERE r.id IN ('44930','44927')
    ORDER BY r.id,m.project_id
   )W
   WHERE Q.id=W.id
   GROUP BY Q.id
   HAVING COUNT(`Is it missing`)=SUM(`Is it missing`)
 )X,
 (
    -- Total number of reports SELECT COUNT(E.nums) AS total FROM
    (
      SELECT COUNT(r.id) AS nums FROM report r 
      WHERE r.id IN ('44930','44927')
      GROUP BY r.id
    )E    
 )Y 
 ;

Option 2:

Although the above solution 1 avoids line-by-line traversal and comparison, it is essentially a one-by-one comparison of items. So is there any way to avoid comparison?

Of course the answer is yes. We can determine whether it is fully included based on the statistical quantity.

1. Use union all to join the added items with the items to be added without removing duplicates.

 (
 -- Items that should be addedSELECT 
  r.id,m.project_id
FROM 
   report 
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
INNER JOIN application_sample_item si ON s.id=si.sample_id       
INNER JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id
)
UNION ALL
(
 -- Added project select r.id,i.project_id from report r,report_item i 
where r.id = i.report_id and r.id IN ('44930','44927')
group by r.app_id,i.project_id
 )

From the results, we can see that there are duplicate items under the same report, which represent the items that should be added and the items that have been added.

2. According to the joint table results, the number of overlapping items in the statistical report

# Should add the number of items that overlap with those already added select tt.id,count(*) count from 
(
   select t.id,t.project_id,count(*) from 
   (
      (
        -- Items that should be addedSELECT 
          r.id,m.project_id
        FROM 
          report 
          INNER JOIN application a ON r.app_id=a.id
          INNER JOIN application_sample s ON a.id=s.app_id
          INNER JOIN application_sample_item si ON s.id=si.sample_id       
          INNER JOIN set_project_mapping m ON si.set_id=m.set_id
        WHERE r.id IN ('44930','44927')
        ORDER BY r.id,m.project_id
      )
      UNION ALL
      (
        -- Added project select r.id,i.project_id from report r,report_item i 
        where r.id = i.report_id and r.id IN ('44930','44927')
        group by r.app_id,i.project_id
      )
      
   )
   GROUP BY t.id,t.project_id
   HAVING count(*) >1 
) tt group by tt.id 

3. Compare the amount in the second step with the amount that should be added. If they are equal, it means there is no omission.

select bb.id, aa.count has been added, bb.count needs to be added,
    CASE WHEN aa.count/bb.count=1 THEN 1
    ELSE 0
    END AS 'Is it missing' 
from 
(
# Should add the number of items that overlap with those already added select tt.id,count(*) count from 
(
   select t.id,t.project_id,count(*) from 
   (
      (
        -- Items that should be addedSELECT 
          r.id,m.project_id
        FROM 
          report 
          INNER JOIN application a ON r.app_id=a.id
          INNER JOIN application_sample s ON a.id=s.app_id
          INNER JOIN application_sample_item si ON s.id=si.sample_id       
          INNER JOIN set_project_mapping m ON si.set_id=m.set_id
        WHERE r.id IN ('44930','44927')
        ORDER BY r.id,m.project_id
      )
      UNION ALL
      (
        -- Added project select r.id,i.project_id from report r,report_item i 
        where r.id = i.report_id and r.id IN ('44930','44927')
        group by r.app_id,i.project_id
      )
      
   )
   GROUP BY t.id,t.project_id
   HAVING count(*) >1 
) tt group by tt.id 
) aa RIGHT JOIN
(
  -- The number of items that should be addedSELECT 
    r.id,s.app_id,COUNT(m.project_id) count
  FROM 
    report 
    INNER JOIN application a ON r.app_id=a.id
    INNER JOIN application_sample s ON a.id=s.app_id
    INNER JOIN application_sample_item si ON s.id=si.sample_id       
    INNER JOIN set_project_mapping m ON si.set_id=m.set_id
  WHERE r.id IN ('44930','44927')
  GROUP BY r.id
  ORDER BY r.id,m.project_id
) bb ON aa.id = bb.id 
ORDER BY aa.id 

4. Find the no-missing rate

select 
    SUM(asr.`Is it missing?) AS No missing number, COUNT(asr.id) AS Total number, CONCAT(FORMAT(SUM(asr.`Is it missing?)/COUNT(asr.id)*100,5),'%') AS Report no missing rate from 
(
  select bb.id, aa.count has been added, bb.count needs to be added,
      CASE WHEN aa.count/bb.count=1 THEN 1
      ELSE 0
      END AS 'Is it missing' 
  from 
  (
  # Should add the number of items that overlap with those already added select tt.id,count(*) count from 
  (
     select t.id,t.project_id,count(*) from 
     (
        (
          -- Items that should be addedSELECT 
            r.id,m.project_id
          FROM 
            report 
            INNER JOIN application a ON r.app_id=a.id
            INNER JOIN application_sample s ON a.id=s.app_id
            INNER JOIN application_sample_item si ON s.id=si.sample_id       
            INNER JOIN set_project_mapping m ON si.set_id=m.set_id
          WHERE r.id IN ('44930','44927')
          ORDER BY r.id,m.project_id
        )
        UNION ALL
        (
          -- Added project select r.id,i.project_id from report r,report_item i 
          where r.id = i.report_id and r.id IN ('44930','44927')
          group by r.app_id,i.project_id
        )
        
     )
     GROUP BY t.id,t.project_id
     HAVING count(*) >1 
  ) tt group by tt.id 
  ) aa RIGHT JOIN
  (
    -- The number of items that should be addedSELECT 
      r.id,s.app_id,COUNT(m.project_id) count
    FROM 
      report 
      INNER JOIN application a ON r.app_id=a.id
      INNER JOIN application_sample s ON a.id=s.app_id
      INNER JOIN application_sample_item si ON s.id=si.sample_id       
      INNER JOIN set_project_mapping m ON si.set_id=m.set_id
    WHERE r.id IN ('44930','44927')
    GROUP BY r.id
    ORDER BY r.id,m.project_id
  ) bb ON aa.id = bb.id 
  ORDER BY aa.id
) asr;

This is the end of this article about the steps to determine whether MySQL is a subset. For more information about how to determine whether MySQL is a subset, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MySql8 WITH RECURSIVE recursive query parent-child collection method

<<:  W3C Tutorial (9): W3C XPath Activities

>>:  What is Nginx load balancing and how to configure it

Recommend

Solution to MySQL root password error number 1045

Stop MySQL Service Windows can right-click My Com...

Solution to multiple 302 responses in nginx proxy (nginx Follow 302)

Proxying multiple 302s with proxy_intercept_error...

Summary of practical methods for JS beginners to process arrays

join() method: connects all elements in an array ...

Summary of 10 advanced tips for Vue Router

Preface Vue Router is the official routing manage...

MySQL index principle and query optimization detailed explanation

Table of contents 1. Introduction 1. What is an i...

How to use the EXPLAIN command in SQL

In daily work, we sometimes run slow queries to r...

Complete steps for Nginx to configure anti-hotlinking

need: Usually, sites want to prevent videos and p...

HTML tutorial, easy to learn HTML language (2)

*******************Introduction to HTML language (...

Detailed explanation of MYSQL log and backup and restore issues

This article shares MYSQL logs and backup and res...

43 Web Design Mistakes Web Designers Should Watch Out For

This is an article about website usability. The a...

JavaScript code to implement Weibo batch unfollow function

A cool JavaScript code to unfollow Weibo users in...

How Database SQL SELECT Queries Work

As Web developers, although we are not profession...

Win10 installation Linux system tutorial diagram

To install a virtual machine on a Windows system,...