Implementation of MySQL Shell import_table data import

Implementation of MySQL Shell import_table data import

1. Introduction to import_table

In the last technical sharing, we introduced 4 common methods of MySQL Load Data to import text data into MySQL. In this issue, we will continue to introduce another more efficient data import tool, import_table in the MySQL Shell tool set. The full name of this tool is Parallel Table Import Utility . As the name suggests, it supports concurrent data import. After MySQL Shell version 8.0.23, the function of this tool is more complete. The following lists the core functions of this tool

  • Basically covers all the functions of MySQL Data Load and can be used as a substitute
  • Concurrent import is supported by default (supports custom chunk size)
  • Support wildcard matching to import multiple files into one table at the same time (very suitable for aggregating data with the same structure into one table)
  • Support speed limit (very suitable for scenarios with high bandwidth requirements)
  • Supports compressed file processing
  • Supports importing into MySQL 5.7 and above

2. Load Data and import table function examples

This section provides command examples for the same functions as import table and load data. We still use the sample data of the employees table as an example to demonstrate the comprehensive scenario of MySQL load data.

  • Import data in custom order
  • Data function processing
  • Custom data value
## Sample data is as follows [root@10-186-61-162 tmp]# cat employees_01.csv
"10001","1953-09-02","Georgi","Facello","M","1986-06-26"
"10003","1959-12-03","Parto","Bamford","M","1986-08-28"
"10002","1964-06-02","Bezalel","Simmel","F","1985-11-21"
"10004","1954-05-01","Christian","Koblick","M","1986-12-01"
"10005","1955-01-21","Kyoichi","Maliniak","M","1989-09-12"
"10006","1953-04-20","Anneke","Preusig","F","1989-06-02"
"10007","1957-05-23","Tzvetan","Zielinski","F","1989-02-10"
"10008","1958-02-19","Saniya","Kalloufi","M","1994-09-15"
"10009","1952-04-19","Sumant","Peac","F","1985-02-18"
"10010","1963-06-01","Duangkaew","Piveteau","F","1989-08-24"

## Sample table structure 10.186.61.162:3306 employees SQL > desc emp;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| emp_no | int | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| full_name | varchar(64) | YES | | NULL | | -- Newly added field in the table, does not exist in the exported data file | gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
| modify_date | datetime | YES | | NULL | | -- A new field is added to the table, which does not exist in the exported data file| delete_flag | varchar(1) | YES | | NULL | | -- A new field is added to the table, which does not exist in the exported data file+-------------+---------------+------+-----+---------+-------+

2.1 Importing data using Load Data

The specific meaning of the parameters is not explained here. If you need to understand the syntax rules and meanings, please refer to the previous article in the series <Multiple uses of MySQL Load Data>

load data infile '/data/mysql/3306/tmp/employees_01.csv'
into table employees.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@C1,@C2,@C3,@C4,@C5,@C6)
set emp_no=@C1,
    birth_date=@C2,
    first_name=upper(@C3),
    last_name=lower(@C4),
    full_name=concat(first_name,' ',last_name),
    gender=@C5,
    hire_date=@C6 ,
    modify_date = now(),
    delete_flag=if(hire_date<'1988-01-01','Y','N'); 

2.2 Importing data using import_table

util.import_table(
    [
        "/data/mysql/3306/tmp/employees_01.csv",
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    }) 

3. import_table specific features

3.1 Multiple file import (fuzzy matching)

## Before importing, I generated 3 separate employees files, and the exported structures are consistent [root@10-186-61-162 tmp]# ls -lh
Total usage: 1.9G
-rw-r----- 1 mysql mysql 579 March 24 19:07 employees_01.csv
-rw-r----- 1 mysql mysql 584 March 24 18:48 employees_02.csv
-rw-r----- 1 mysql mysql 576 March 24 18:48 employees_03.csv
-rw-r----- 1 mysql mysql 1.9G March 26 17:15 sbtest1.csv

## Import command, where employees_* is used for fuzzy matching util.import_table(
    [
        "/data/mysql/3306/tmp/employees_*",
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    })
    
## Import command, in which the path of the file to be imported is clearly specified util.import_table(
    [
        "/data/mysql/3306/tmp/employees_01.csv",
        "/data/mysql/3306/tmp/employees_02.csv",
        "/data/mysql/3306/tmp/employees_03.csv"
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    }) 

3.2 Concurrent Import

Before experimenting with concurrent import, we create a 10 million sbtest1 table (about 2G data) to simulate concurrency. The import_table parameter uses threads as the concurrency configuration, and the default is 8 concurrency.

## Export the sbtest1 data required for the test [root@10-186-61-162 tmp]# ls -lh
Total usage: 1.9G
-rw-r----- 1 mysql mysql 579 March 24 19:07 employees_01.csv
-rw-r----- 1 mysql mysql 584 March 24 18:48 employees_02.csv
-rw-r----- 1 mysql mysql 576 March 24 18:48 employees_03.csv
-rw-r----- 1 mysql mysql 1.9G March 26 17:15 sbtest1.csv

## Enable threads to be 8 concurrent util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "8"
    }) 

3.3 Import rate control

You can use maxRate and threads to control the import data of each concurrent thread. For example, if the current configuration has 4 threads and the rate of each thread is 2M/s, the maximum will not exceed 8M/s.

util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "4",
        "maxRate": "2M"
    }) 

3.4 Custom chunk size

The default chunk size is 50M. We can adjust the chunk size to reduce the transaction size. For example, if we adjust the chunk size to 1M, the amount of data imported by each thread each time will also be reduced accordingly.

util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "4",
        "bytesPerChunk": "1M",
        "maxRate": "2M"
    }) 

4. Load Data vs import_table performance comparison

  • Use the same library table
  • No special processing is done on the data, just import it as is
  • Do not modify the default parameters, only specify the required parameters
-- Load Data statement load data infile '/data/mysql/3306/tmp/sbtest1.csv'
into table demo.sbtest1
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'

-- import_table statement util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4"
    }) 

As you can see, Load Data takes about 5 minutes, while import_table takes less than half the time to complete data import, which is more than twice as efficient (under the condition of limited disk IO capacity in the virtual machine environment)

5. Technical Summary

  • import_table includes almost all the functions of Load Data
  • import_table import is more efficient than Load Data
  • import_table supports fine-grained control over import speed, concurrency, and the size of each imported data
  • The import progress report of import_table is more detailed, which is convenient for troubleshooting and time estimation, including
    • Import speed
    • Total import time
    • The amount of data imported in each batch, whether there are Warnings, etc.
    • Import the final summary report

This is the end of this article about the implementation of MySQL import_table data import. For more relevant MySQL import_table data import content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Mysql command line import sql data
  • MYSQL database import and export commands
  • Summary of solutions to Chinese garbled characters in MySQL import and export data
  • How to import and export data from MySQL text files
  • How to import csv format data file solution into MySQL
  • How to import sql files in linux (using command line to transfer mysql database)
  • How to import and export databases and data tables in mysql
  • Introduction to mysql import and export database, functions and stored procedures
  • MYSQL big data import

<<:  Solution to the problem of incomplete display of select drop-down box content in HTML and partial coverage

>>:  A detailed introduction to the redesign of the Weibo component on the website (pictures and text)

Recommend

jQuery implements a simple carousel effect

Hello everyone, today I will share with you the i...

Vue component communication method case summary

Table of contents 1. Parent component passes valu...

How to implement vue page jump

1. this.$router.push() 1. Vue <template> &l...

Detailed explanation of how to mount remote file systems via SSH on Linux

Features of SSHFS: Based on FUSE (the best usersp...

An article to understand the execution process of MySQL query statements

Preface We need to retrieve certain data that mee...

Implementation of docker-compose deployment project based on MySQL8

1. First, create the corresponding folder accordi...

Detailed explanation of Vue plugin

Summarize This article ends here. I hope it can b...

17 JavaScript One-Liners

Table of contents 1. DOM & BOM related 1. Che...

Steps to create a Vite project

Table of contents Preface What does yarn create d...

Difference between src and href attributes

There is a difference between src and href, and t...

mysql show simple operation example

This article describes the mysql show operation w...

Implementation steps for enabling docker remote service link on cloud centos

Here we introduce the centos server with docker i...

Network management and network isolation implementation of Docker containers

1. Docker network management 1. Docker container ...