Example of utf8mb4 collation in MySQL

Example of utf8mb4 collation in MySQL

Common utf8mb4 sorting rules in MySQL are:

  • utf8mb4_0900_ai_ci
  • utf8mb4_unicode_ci
  • utf8mb4_general_ci

When the default character set of a table is set to utf8mb4 but the collation is not explicitly specified:

  • In MySQL version 5.7, the default collation is utf8mb4_general_ci.
  • In MySQL version 8.0, the default collation is utf8mb4_0900_ai_ci.

Because the utf8mb4_0900_ai_ci collation is the collation introduced in MySQL 8.0, when you import a MySQL 8.0 table into MySQL 5.7 or MySQL 5.6, the character set may not be recognized.

[Err] 1273 - Unknown collation: 'utf8mb4_0900_ai_ci'

Parameter Control

In MySQL version 5.6, the collation_server parameter is used to set the default collation at the server level.

  • If the value of the collation_database parameter is not specified when the service is started, the value of the collation_server parameter is inherited by default.
  • If no collation is specified when creating a database, the value of the collation_database parameter is used by default.

The character_set_database and collation_database parameters are deprecated in MySQL 5.7 and will be removed in a subsequent version.
MySQL adds a new parameter default_collation_for_utf8mb4 to control the default collation when using the utf8mb4 character set. The value is utf8mb4_0900_ai_ci or utf8mb4_general_ci
The default_collation_for_utf8mb4 parameter takes effect under the following conditions:

  • When using the SHOW COLLATION and SHOW CHARACTER SET commands.
  • When creating or modifying a library, specify utf8mb4 but do not specify the encoding rule.
  • When you create or modify a table, specify utf8mb4 but do not specify an encoding rule.
  • When adding or modifying a column, specify utf8mb4 but do not specify the encoding rule.
  • Others use utf8mb4 but do not specify an encoding rule.

Comparison between utf8mb4_unicode_ci and utf8mb4_general_ci

1. Accuracy

  • The utf8mb4_unicode_ci sorting rule is based on standard unicode sorting and comparison, can handle special characters, and can accurately sort in various languages.
  • The utf8mb4_general_ci sorting rule is not based on standard unicode and cannot handle some special characters.

2. Performance

  • The utf8mb4_general_ci collation is relatively good in sorting performance
  • The utf8mb4_unicode_ci sorting rule implements a complex sorting algorithm to handle special characters, and its performance is slightly worse.
  • In most scenarios, there is no significant performance difference between the two.

Supplement: MySQL--utf8mb4 and equal value query test

Test Scripts

## Delete the test table DROP TABLE IF EXISTS tb2001;
DROP TABLE IF EXISTS tb2002;
DROP TABLE IF EXISTS tb2003;

## Create a test table CREATE TABLE tb2001(
id INT AUTO_INCREMENT PRIMARY KEY,
c1 VARCHAR(100) COLLATE utf8mb4_unicode_ci,
c2 VARCHAR(100) COLLATE utf8mb4_bin
)ENGINE=INNODB DEFAULT CHARSET=utf8mb4 ;

CREATE TABLE tb2002(
id INT AUTO_INCREMENT PRIMARY KEY,
c1 VARCHAR(100) COLLATE utf8mb4_general_ci,
c2 VARCHAR(100) COLLATE utf8mb4_bin
)ENGINE=INNODB DEFAULT CHARSET=utf8mb4;

CREATE TABLE tb2003(
id INT AUTO_INCREMENT PRIMARY KEY,
c1 VARCHAR(100) COLLATE utf8mb4_0900_ai_ci,
c2 VARCHAR(100) COLLATE utf8mb4_bin
)ENGINE=INNODB DEFAULT CHARSET=utf8mb4;

## Insert test data INSERT INTO tb2001(c1,c2) ​​VALUES(0xF09F8D83,0xF09F8D83),(0xF09FA68A,0xF09FA68A),(0xF09F8CA0,0xF09F8CA0);
INSERT INTO tb2002(c1,c2)VALUES(0xF09F8D83,0xF09F8D83),(0xF09FA68A,0xF09FA68A),(0xF09F8CA0,0xF09F8CA0);
INSERT INTO tb2003(c1,c2)VALUES(0xF09F8D83,0xF09F8D83),(0xF09FA68A,0xF09FA68A),(0xF09F8CA0,0xF09F8CA0);

## Equivalence query test SELECT * FROM tb2001 WHERE c1=0xF09F8D83;
SELECT * FROM tb2002 WHERE c1=0xF09F8D83;
SELECT * FROM tb2003 WHERE c1=0xF09F8D83;

SELECT * FROM tb2001 WHERE c2=0xF09F8D83;
SELECT * FROM tb2002 WHERE c2=0xF09F8D83;
SELECT * FROM tb2003 WHERE c2=0xF09F8D83;

Test Results

mysql> SELECT * FROM tb2001 WHERE c1=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
| 2 | 🦊 | 🦊 |
| 3 | 🌠 | 🌠 |
+----+------+------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM tb2002 WHERE c1=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
| 2 | 🦊 | 🦊 |
| 3 | 🌠 | 🌠 |
+----+------+------+
3 rows in set (0.01 sec)

mysql> SELECT * FROM tb2003 WHERE c1=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
+----+------+------+
1 row in set (0.00 sec)

mysql> 
mysql> SELECT * FROM tb2001 WHERE c2=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
+----+------+------+
1 row in set (0.00 sec)

mysql> SELECT * FROM tb2002 WHERE c2=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
+----+------+------+
1 row in set (0.00 sec)

mysql> SELECT * FROM tb2003 WHERE c2=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
+----+------+------+
1 row in set (0.00 sec)

Test Summary

  • Using the utf8mb4 character set can store special characters such as emoticons.
  • The utf8mb4_unicode_ci and utf8mb4_general_ci collations do not "correctly match" special characters.
  • The utf8mb4_0900_ai_ci collation can "correctly match" special characters, but it is only applicable to MySQL version 8.0.
  • The utf8mb4_bin sorting rule matches special characters based on a binary system, can "correctly match" special characters, and is applicable to all versions of MySQL.

This is the end of this article about utf8mb4 sorting rules in MySQL. For more information about MySQL utf8mb4 sorting, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Steps to change mysql character set to UTF8 under Linux system
  • MySQL encoding utf8 and utf8mb4 utf8mb4_unicode_ci and utf8mb4_general_ci
  • How to change the encoding of MySQL database to utf8mb4
  • mysql charset=utf8 do you really understand what it means
  • How to change MySQL character set utf8 to utf8mb4
  • mysql garbled characters latin1 characters converted to UTF8 details

<<:  Perfect solution for JavaScript front-end timeout asynchronous operation

>>:  How to modify the default submission method of the form

Recommend

MySQL 5.6.36 Windows x64 version installation tutorial detailed

1. Target environment Windows 7 64-bit 2. Materia...

Detailed explanation of the usage of scoped slots in Vue.js slots

Table of contents No slots Vue2.x Slots With slot...

MySQL 5.7.18 zip version installation tutorial

The mysql 5.7.18 zip version of MySQL is not like...

Detailed explanation of how Node.js middleware works

Table of contents What is Express middleware? Req...

javascript countdown prompt box

This article example shares the specific code of ...

Get the calculated style in the CSS element (after cascading/final style)

To obtain the calculated style in a CSS element (t...

How to use Samba to build a shared file service on a Linux server

Recently, our small team needs to share a shared ...

Summary of @ usage in CSS (with examples and explanations)

An at-rule is a declaration that provides instruc...

Detailed explanation of Mysql function call optimization

Table of contents Function call optimization Func...

VUE+Canvas implements the game of God of Wealth receiving ingots

Welcome to the previous canvas game series: γ€ŠVUE ...

How to quickly log in to MySQL database without password under Shell

background When we want to log in to the MySQL da...