Why web page encoding uses utf-8 instead of gbk or gb2312?

Why web page encoding uses utf-8 instead of gbk or gb2312?

If you have a choice, you should use UTF-8

In fact, Windows system's own programs have already fully switched to Unicode, and GBK is just a stopgap measure to cope with Chinese standards.

GBK's text encoding is expressed in double bytes, that is, both Chinese and English characters are expressed in double bytes, but in order to distinguish Chinese, the highest bit is set to 1.

As for UTF-8 encoding, it is a multi-byte encoding used to resolve international characters. It uses 8 bits (one byte) for English and 24 bits (three bytes) for Chinese. For forums with more English characters, UTF-8 is used to save space.

GBK contains all Chinese characters.

UTF-8 contains characters needed by all countries in the world.

GBK is a standard that is compatible with GB2312 after expansion based on the national standard GB2312 (it seems that it is not yet a national standard)

UTF-8 encoded text can be displayed on browsers in various countries that support the UTF8 character set.
For example, if it is UTF8 encoding, Chinese can be displayed on foreigners' English IE without them having to download the Chinese language support package for IE.

Therefore, for forums with more English, using GBK will take up 2 bytes for each character, while using UTF-8 will only take up one byte.

Please note: Although the UTF-8 version has good international compatibility, the Chinese version requires 50% more database storage space than the GBK/BIG5 version. Therefore, it is not recommended and is only for users who have special requirements for international compatibility.

In simple terms:
For forums with a lot of Chinese text, it is appropriate to use GBK encoding to save database space.
For forums with more English content, it is appropriate to use UTF-8 to save database space.

What are the differences between gbk and gb2312

First of all, everyone needs to understand what is GBK? What is GB2312? We need to know that they are all a kind of character encoding, of course there are many kinds of character encoding.

We can understand character encoding as follows:

Computers store binary values ​​of 0 and 1.

8 bits correspond to one byte, which is usually expressed in hexadecimal.

So how can we achieve this if we want to see the characters we want displayed on the computer instead of various numbers 0 and 1?

Here we need to make the computer convert the corresponding hexadecimal values ​​it stores into corresponding characters, including characters in other languages ​​such as English and Chinese, and then output them to the screen.

So encoding means defining a set of rules to specify which values ​​correspond to which characters.

Then character encoding defines a set of rules that specify which value among the many values ​​stored in the computer corresponds to which letter displayed on the computer screen.

To sum up, everyone should understand that GBK and GB2312 are a kind of character encoding.

Let's talk about their differences and similarities in detail below:

Similarities:

1. GBK and GB2312 are both 16 bits!

2. They are usually used in the meta tags of web pages.

Differences:

1. GBK character encoding supports Simplified Chinese and Traditional Chinese!

GBK stands for "Chinese Internal Code Extension Specification" (GBK means the first letter of "national standard" and "extension" of Chinese pinyin, and its English name is Chinese Internal Code Specification). It was formulated by the National Technical Committee of Information Technology Standardization of the People's Republic of China on December 1, 1995. The Standardization Department of the State Administration of Technical Supervision and the Science and Technology and Quality Supervision Department of the Ministry of Electronics Industry jointly identified it as a technical specification guiding document in the form of the document No. 229 of Technical Supervision Letter 1995 on December 15, 1995.

2. GB2312 only supports Simplified Chinese!

"Chinese Character Coded Character Set for Information Interchange" is a set of national standards issued by the General Administration of Standards of China in 1980 and implemented on May 1, 1981. The standard number is GB 2312-1980.
GB 2312 standard includes a total of 6763 Chinese characters, including 3755 first-level Chinese characters and 3008 second-level Chinese characters; at the same time, GB 2312 includes 682 full-width characters including Latin letters, Greek letters, Japanese Hiragana and Katakana letters, and Russian Cyrillic letters.

If your web pages are mainly for Chinese people who speak Chinese, it is very good to use GB2312 and GBK. The text storage volume is small and there are some advantages. If your web page is to be viewed by the world, and you use GB2312 and GBK as the web page encoding, some browsers on computers do not have this encoding, and the Chinese characters on your web page will become unrecognizable garbled characters.

<<:  Sharing some wonderful uses of wxs files in WeChat applet

>>:  How to convert extra text into ellipsis in HTML

Recommend

Three ways to align div horizontal layout on both sides

This article mainly introduces three methods of i...

MySQL 5.7.16 ZIP package installation and configuration tutorial

This article shares the installation and configur...

Writing daily automatic backup of MySQL database using mysqldump in Centos7

1. Requirements: Database backup is particularly ...

jQuery plugin to implement accordion secondary menu

This article uses a jQuery plug-in to create an a...

In-depth explanation of the impact of NULL on indexes in MySQL

Preface I have read many blogs and heard many peo...

Book page turning effects made with CSS3

Result:Implementation code: html <!-- Please h...

CentOS system rpm installation and configuration of Nginx

Table of contents CentOS rpm installation and con...

Detailed explanation of the role of the new operator in Js

Preface Js is the most commonly used code manipul...

Detailed analysis of Vue child components and parent components

Table of contents 1. Parent components and child ...

Windows platform configuration 5.7 version + MySQL database service

Includes the process of initializing the root use...

Implementation of Docker building Maven+Tomcat basic image

Preface In Java programming, most applications ar...

Reasons why MySQL cancelled Query Cache

MySQL previously had a query cache, Query Cache. ...