How to detect whether a file is damaged using Apache Tika

How to detect whether a file is damaged using Apache Tika

Apache Tika is a library for file type detection and content extraction from files of various formats.

When uploading files to a server and parsing them, you often need to determine whether the files are damaged. We can use tika to detect whether the file is damaged

Maven is introduced as follows:

<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-app</artifactId>
  <version>1.18</version>
</dependency>
<dependency>
  <groupId>xerces</groupId>
  <artifactId>xercesImpl</artifactId>
  <version>2.11.0</version>
</dependency>

If there is a conflict in the jar packages, you can introduce them as follows:

<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-core</artifactId>
  <version>1.18</version>
</dependency>
<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-parsers</artifactId>
  <version>1.18</version>
</dependency>
<dependency>
  <groupId>xerces</groupId>
  <artifactId>xercesImpl</artifactId>
  <version>2.11.0</version>
</dependency>

Use tika to detect whether the file is damaged:

If reading from the input stream fails, the parse method throws an IOException. If the document obtained from the stream cannot be parsed, a TikaException is thrown. If the processor cannot handle the event, a SAXException is thrown.

When a document cannot be parsed, it indicates that the document is corrupted.

Execution process:

public static void main(String[] args) {
    try {
      //Assume sample.txt is in your current directory
      File file = new File("D:\\Test.txt");
      boolean result = isParseFile(file);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
 
  /**
   * Verify if the file is corrupted*
   * @param file file * @return true/false
   * @throws Exception
   */
  private static boolean isParseFile(File file) throws Exception {
    try {
      Tika tika = new Tika();
      String filecontent = tika.parseToString(file);
      System.out.println(filecontent);
      return true;
    } catch (TikaException e) {
      return false;
    }
  }

Output:

Test data---read text content

Summarize

The above is the method of Apache Tika to detect whether the file is damaged. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!
If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you!

You may also be interested in:
  • How to detect whether Apache mod_rewrite module is installed in PHP

<<:  Ant designing vue table to achieve a complete example of scalable columns

>>:  Mysql 8.0 installation and password reset issues

Recommend

How to use jconsole to monitor remote Tomcat services

What is JConsole JConsole was introduced in Java ...

Concat() of combined fields in MySQL

Table of contents 1. Introduction 2. Main text 2....

Detailed explanation of the process of installing msf on Linux system

Or write down the installation process yourself! ...

How to use css overflow: hidden (overflow hiding and clearing floats)

Overflow Hide It means hiding text or image infor...

Summary of various methods for Vue to achieve dynamic styles

Table of contents 1. Ternary operator judgment 2....

Briefly describe the difference between Redis and MySQL

We know that MySQL is a persistent storage, store...

Suggestions on creating business HTML emails

Through permission-based email marketing, not onl...

React Fragment Introduction and Detailed Usage

Table of contents Preface Motivation for Fragment...

Nginx/Httpd load balancing tomcat configuration tutorial

In the previous blog, we talked about using Nginx...

A detailed introduction to Linux file permissions

The excellence of Linux lies in its multi-user, m...

CSS3 realizes the animation effect of lotus blooming

Let’s look at the effect first: This effect looks...