1. Data Deduplication In daily work, there may be data duplication when using Hive or Impala to query and export, but you don’t want to re-execute the query (the query time is a bit long and the exported file content is large), so you think of using Linux commands to remove duplicate data from the file content. The following is an example: You can see that aaa.txx has 3 duplicate data I want to remove the redundant data and keep only one sort aaa.txt | uniq > bbb.txt Remove duplicate data from the aaa.txt file and output it to bbb.txt You can see that only one piece of data is retained in the bbb.txt file 2. Data intersection, union, and difference 1) Intersection (equivalent to user_2019 inner join user_2020 on user_2019.user_no=user_2020.user_no) 2) Union (equivalent to user_2019.user_no union user_2020.user_no) 3) Difference
The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM. You may also be interested in:
|
<<: In-depth understanding of MySQL long transactions
>>: js to realize a simple disc clock
Table of contents Preface Demonstration effect HT...
Table of contents Date Object Creating a Date Obj...
This article shares a native JS implementation of...
Table of contents Preface Is the interviewer aski...
Table of contents 1. Content Overview 2. Concepts...
After watching this, I guarantee that you have ha...
There are many tags and elements in the HTML head ...
Docker usage of gitlab gitlab docker Startup Comm...
Recently, when using element table, I often encou...
This article shares with you how to use Navicat t...
The data that Navicat has exported cannot be impo...
The parent node of the parent node, for example, t...
Today we will introduce several ways to use CSS t...
It is a very common requirement to set the horizo...
html Copy code The code is as follows: <SPAN cl...