Solution for importing more data from MySQL into Hive

Solution for importing more data from MySQL into Hive

Original derivative command:

bin/sqoop import -connect jdbc:mysql://192.168.169.128:3306/yubei -username root -password 123456 -table yl_city_mgr_evt_info --split-by rec_id -m 4 --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite -create-hive-table -delete-target-dir -hive-database default -hive-table yl_city_mgr_evt_info

Cause analysis: There may be separators such as '\n' in the MySQL field. When imported into Hive, 'n' is used as the line break by default, resulting in more records in Hive.

Solution:

When importing data, add the --hive-drop-import-delims option to delete \n, \r, and \01 in the field.

Final derivative command:

bin/sqoop import -connect jdbc:mysql://192.168.169.128:3306/yubei -username root -password 123456 -table yl_city_mgr_evt_info --split-by rec_id -m 4 --hive-drop-import-delims --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite -create-hive-table -delete-target-dir -hive-database default -hive-table yl_city_mgr_evt_info

Refer to the official documentation: https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html

Supplement: Pitfalls encountered when importing MySQL data from Sqoop to Hive

1.sqoop import to HDFS

1.1 Executing the sqoop job will automatically update the last value

# sqoop incremental import script bin/sqoop job --create sqoop_hdfs_test02 --import \
--connect jdbc:mysql://localhost:3306/pactera_test \
--username root \
--password 123456 \
--table student \
--target-dir /user/sqoop/test002/ \
--fields-terminated-by "\t" \
--check-column last_modified \
--incremental lastmodified \
--last-value "2018-12-12 00:03:00" \
--append

Note: The --append parameter is required, otherwise an error will be reported when running the job for the second time, as follows:

At this point, the sqoop job has been built!

2.Hive creates a table and reads the data imported by sqoop

create external table if not exists student_hive (SId int,Sname string ,Sage string,Ssex string , last_modified Timestamp) 
row format delimited fields terminated by '\t' location 'hdfs://node01:8020/user/sqoop/test002/';

Note: The time format in hive is timestamp. If it is set to date, DB data cannot be loaded normally.

The first full load is complete, and the entire route is completely OK, and the hive table can query data.

-----------------------Key dividing line-----------------------

* Incremental loading in sqoop lastmodified format will save the last-value as the system time of job execution. If the check-column of the test database is less than the current system time (that is, the last-value of the previous job), the data will not be loaded.

If SId=6, it will not be loaded, so change to today's time (2018-12-26 17:05) for data testing, and the data will be loaded successfully! Yoho! !

Summarize:

When using the lastmodified format for sqoop incremental import,

1. Note the use of --append;

2. Last-value is the system time when the job runs. When testing data, ensure that the data is accurate and self-increasing.

3. Everything has a fixed number. Check the information and accurately locate the problems encountered by your system

The above is my personal experience. I hope it can give you a reference. I also hope that you will support 123WORDPRESS.COM. If there are any mistakes or incomplete considerations, please feel free to correct me.

You may also be interested in:
  • How to quickly import data into MySQL
  • MySQL 4 methods to import data
  • How to use Navicat to export and import mysql database
  • Java uses MYSQL LOAD DATA LOCAL INFILE to import large amounts of data into MySQL
  • Detailed explanation of MySQL command line export and import database instance
  • Steps to import mysql database under wampserver
  • MySQL command line export and import database
  • Use mysqldump to import data and mysqldump incremental backup (mysqldump usage)
  • Several different ways to import data into MYSQL
  • Modify php.ini to implement the maximum limit of Mysql import database file modification method
  • Steps for importing tens of millions of data into MySQL using .Net Core

<<:  Detailed tutorial on installing nacos in docker and configuring the database

>>:  JavaScript imitates Xiaomi carousel effect

Recommend

How to process blob data in MySQL

The specific code is as follows: package epoint.m...

Detailed explanation of the production principle of jQuery breathing carousel

This article shares the specific process of the j...

Solution to index failure caused by MySQL implicit type conversion

Table of contents question Reproduction Implicit ...

A complete list of commonly used HTML tags and their characteristics

First of all, you need to know some characteristi...

Display special symbols in HTML (with special character correspondence table)

Problem Reproduction When using HTML for editing,...

Vue project realizes paging effect

The paging effect is implemented in the vue proje...

How to introduce Excel table plug-in into Vue

This article shares the specific code of Vue intr...

Install mysql 5.6 from yum source in centos7.4 system

System environment: centos7.4 1. Check whether th...

How to deploy kafka in docker

Table of contents 1. Build Docker 2. Enter the co...

How to hide the border/separation line between cells in a table

Only show the top border <table frame=above>...

TypeScript installation and use and basic data types

The first step is to install TypeScript globally ...

How to smoothly upgrade and rollback Nginx version in 1 minute

Today, let's talk about a situation that is o...