Unicode signature BOM detailed description

Unicode signature BOM detailed description
Unicode Signature BOM - What is the BOM?
BOM is the abbreviation of Byte Order Mark. It is the standard mark used to identify the encoding in the UTF encoding scheme. In UTF-16, it was originally FF FE, and in UTF-8 it becomes EF BB BF. This flag is optional, and since UTF8 bytes have no order, it can be used to detect whether a byte stream is UTF-8 encoded. Microsoft does this detection, but some software does not and treats it as a normal character.

Microsoft adds three bytes EF BB BF before its own UTF-8 format text files. Programs such as Notepad on Windows determine whether a text file is ASCII or UTF-8 based on these three bytes. However, this is just a mark made by Microsoft secretly. There is no such mark for UTF-8 text files on other platforms.

Unicode signature BOM - How to view UTF-8

That is to say, a UTF-8 file may have a BOM or may not have a BOM, so how to distinguish them?
Four methods.
1. Open the file with UltraEdit-32 , switch to hexadecimal editing mode, and check whether there is EF BB BF in the file header.
2. Open it with Dreamweaver, check the page properties, and see if there is a check mark in front of "Include Unicode Signature BOM".
3. Open the file with Windows Notepad, select "Save As", and check whether the default encoding of the file is UTF-8 or ANSI . If it is ANSI, it will not have BOM.

Unicode簽名bom Unicode Signature BOM

4: Open it with emeditor , select "Save As", and check whether Add Unicode Signature (bom) (G) under Encoding is checked. As shown in the figure:

Unicode Signature BOM - Problems and Solutions when Applying in PHP

Note that when using Convertz to convert a gb2312 file to a UTF-8 file, the default setting is to not include BOM. The above garbled characters may appear without BOM. However, if BOM is included, you should be careful with PHP include files, as EFBBBF will be added in front of the PHP byte stream. Outputting it to the display in advance may cause program errors. One solution is to save all included files as ANSI, and the main file can be UTF-8. To remove the BOM from a file, open it with UlterEdit, switch to hexadecimal editing mode, replace the first three bytes (that damn EFBBBF) with 20, save (note to turn off the automatic backup function when saving), then switch to the default editing mode and remove the first three spaces.

Unicode signature bom-coding tips

I also learned some little knowledge about encoding: the so-called unicode saved files are actually utf-16, which just happens to be the same as the unicode code, but conceptually unicode and utf are two different things. unicode is a memory encoding representation scheme, and utf is a scheme for how to save and transmit unicode. UTF-16 is divided into two types: high byte first (LE) and high byte last (BE). The official utf encoding also includes utf-32, which is also divided into LE and BE. The non-unicode official utf encoding also includes utf-7, which is mainly used for email transmission. The single-byte part of utf-8 is compatible with iso-8859-1. This is mainly because some old systems and library functions cannot handle utf-16 correctly and are forced out. For English characters, it also saves saved file space (at the expense of wasting space for non-English characters). When using iso-8859-1, both utf8 and iso-8859-1 are represented by one byte. When representing other characters, utf-8 uses two or three bytes.

<<:  Detailed explanation of the top ten commonly used string functions in MySQL

>>:  Solve the margin: top collapse problem in CCS

Recommend

How to turn a jar package into a docker container

How to turn a jar package into a docker container...

Element Timeline implementation

Table of contents Components - Timeline Custom no...

Sample code for implementing rolling updates of services using Docker Swarm

1. What is Docker Swarm? Docker Swarm is a cluste...

Example code for implementing dotted border scrolling effect with CSS

We often see a cool effect where the mouse hovers...

The most basic code for web pages

◆Add to favorites illustrate Click to add your we...

CSS and HTML and front-end technology layer diagram

The relationship between Javascript and DOM is ve...

MySQL deep paging (how to quickly paginate tens of millions of data)

Table of contents Preface Case optimization summa...

Detailed tutorial on how to automatically install CentOS7.6 using PXE

1. Demand The base has 300 new servers, and needs...

MySQL series 9 MySQL query cache and index

Table of contents Tutorial Series 1. MySQL Archit...

How to write high-quality JavaScript code

Table of contents 1. Easy to read code 1. Unified...

Sample code for seamless scrolling with flex layout

This article mainly introduces the sample code of...

How to use docker to deploy spring boot and connect to skywalking

Table of contents 1. Overview 1. Introduction to ...