How to use js to determine whether a file is utf-8 encoded

How to use js to determine whether a file is utf-8 encoded

Conventional solution

Use FileReader to read the file in UTF-8 format, and determine whether the file is UTF-8 based on whether the file content contains garbled characters.

If � exists, the file encoding is not utf-8, otherwise it is utf-8.

The code is as follows:

const isUtf8 = async (file: File) => {
  return await new Promise((resolve, reject) => {
    const reader = new FileReader();
    reader.readAsText(file);

    reader.onloadend = (e: any): void => {
      const content = e.target.result;
      const encodingRight = content.indexOf("") === -1;

      if (encodingRight) {
        resolve(encodingRight);
      } else {
        reject(new Error("Encoding format error, please upload UTF-8 format file"));
      }
    };
    
    reader.onerror = () => {
      reject(new Error("File content reading failed, please check if the file is damaged"));
    };
  });
};

The problem with this method is that if the file is very large, such as several GB, the content read by the browser is directly placed in the memory, and the fileReader instance will directly trigger onerror and throw an error, and sometimes the browser will directly crash.

Large file solution

For large files, you can sample the file content and slice the file. Here, 100 slices are used. For each file cut out, cut out the first 1kb segment and read it in string mode. If 1024B is cut right in the middle of a Chinese character encoding, it may cause an error when reading it as a string, that is, � may appear at the beginning and end, and it is considered to be a non-utf-8 segment. At this time, you can take the first half of the string corresponding to 1kb and then determine whether it exists.

The above constants can be adjusted according to requirements.

The code is as follows:

const getSamples = (file: File) => {
  const filesize = file.size;
  const parts: Blob[] = [];
  if (filesize < 50 * 1024 * 1024) {
    parts.push(file);
  } else {
    let total = 100;
    const sampleSize = 1024 * 1024;
    const chunkSize = Math.floor(filesize / total);
    let start = 0;
    let end = sampleSize;
    while (total > 1) {
      parts.push(file.slice(start, end));
      start += chunkSize;
      end += chunkSize;
      total--;
    }
  }
  return parts;
};

const isUtf8 = (filePart: Blob) => {
  return new Promise((resolve, reject) => {
    const fileReader = new FileReader();

    fileReader.readAsText(filePart);

    fileReader.onload = (e) => {
      const str = e.target?.result as string;
      // Take roughly half const sampleStr = str?.slice(4, 4 + str?.length / 2);
      if (sampleStr.indexOf("�") === -1) {
        resolve(void 0);
      } else {
        reject(new Error(Encoding format error, please upload UTF-8 format file"));
      }
    };

    fileReader.onerror = () => {
      reject(new Error(File content reading failed, please check if the file is damaged"));
    };
  });
};

export default async function (file: File) {
  const samples = getSamples(file);
  let res = true;

  for (const filePart of samples) {
    try {
      await isUtf8(filePart);
    } catch (error) {
      res = false;
      break;
    }
  }
  return res;
}

This is the end of this article about how js determines whether a file is encoded in utf-8. For more relevant js judgment utf-8 content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • PHP determines whether the string encoding is utf-8 or gb2312 example
  • PHP regular expression to judge Chinese UTF-8 or GBK and its specific implementation

<<:  mysql5.6.zip format compressed version installation graphic tutorial

>>:  A troubleshooting experience of centos Docker bridge mode unable to access the host Redis service

Recommend

How to monitor array changes in Vue

Table of contents Preface Source code Where do I ...

Vue implements drag and drop or click to upload pictures

This article shares the specific code of Vue to a...

React nested component construction order

Table of contents In the React official website, ...

MySQL compression usage scenarios and solutions

Introduction Describes the use cases and solution...

Docker uses the Prune command to clean up the none image

Table of contents The creation and confusion of n...

Is it easy to encapsulate a pop-up component using Vue3?

Table of contents Summary put first: 🌲🌲 Preface: ...

MySql COALESCE function usage code example

COALESCE is a function that refers to each parame...

How to install and configure the Apache Web server

Learn how to host your own website on Apache, a r...

How to deploy and start redis in docker

Deploy redis in docker First install Docker in Li...

Docker pull image and tag operation pull | tag

I re-read the source code of the Fabric project a...

Summary of @ usage in CSS (with examples and explanations)

An at-rule is a declaration that provides instruc...

Detailed explanation of the use of props in React's three major attributes

Table of contents Class Component Functional Comp...

Linux common text processing commands and vim text editor

Today, let's introduce several common text pr...