Nginx anti-crawler strategy to prevent UA from crawling websites

Nginx anti-crawler strategy to prevent UA from crawling websites

Added anti-crawler policy file:

vim /usr/www/server/nginx/conf/anti_spider.conf

File Contents

#Disable crawling by tools such as Scrapy if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) { 
   return 403; 
} 
#Disable access with specified UA or empty UAif ($http_user_agent ~ "WinHttp|WebZIP|FetchURL|node-superagent|java/|FeedDemon|Jullo|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|Java|Feedly|Apache-HttpAsyncClient|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|BOT/0.1|YandexBot|FlightDeckReports|Linguee Bot|^$" ) { 
   return 403;        
} 
#Disable crawling by methods other than GET|HEAD|POST if ($request_method !~ ^(GET|HEAD|POST)$) { 
  return 403; 
}
#The command to block a single IP is #deny 123.45.6.7
#Block the entire segment from 123.0.0.1 to 123.255.255.254#deny 123.0.0.0/8
#Block the IP range from 123.45.0.1 to 123.45.255.254 #deny 124.45.0.0/16
#The command to block the IP range from 123.45.6.1 to 123.45.6.254 is #deny 123.45.6.0/24
# The following IPs are all rogue #deny 58.95.66.0/24;

Configuration Usage

Introduce in the site's server

# Anti-crawler include /usr/www/server/nginx/conf/anti_spider.conf

Finally restart nginx

Verify whether it is valid

Simulating YYSpider

λ curl -X GET -I -A 'YYSpider' https://www.myong.top
HTTP/1.1 200 Connection established
HTTP/2 403
server: marco/2.11
date: Fri, 20 Mar 2020 08:48:50 GMT
content-type: text/html
content-length: 146
x-source: C/403
x-request-id: 3ed800d296a12ebcddc4d61c57500aa2

Simulate Baiduspider

λ curl -X GET -I -A 'BaiduSpider' https://www.myong.top
HTTP/1.1 200 Connection established
HTTP/2 200
server: marco/2.11
date: Fri, 20 Mar 2020 08:49:47 GMT
content-type: text/html
vary: Accept-Encoding
x-source: C/200
last-modified: Wed, 18 Mar 2020 13:16:50 GMT
etag: "5e721f42-150ce"
x-request-id: e82999a78b7d7ea2e9ff18b6f1f4cc84

Common User-Agents for Crawler

FeedDemon content collection BOT/0.1 (BOT for JCE) sql injection CrawlDaddy sql injection Java content collection Jullo content collection Feedly content collection UniversalFeedParser content collection ApacheBench cc attacker Swiftbot useless crawler YandexBot useless crawler AhrefsBot useless crawler YisouSpider useless crawler (has been acquired by UC Shenma Search, this spider can be released!) 
jikeSpider useless crawlerMJ12bot useless crawlerZmEu phpmyadmin vulnerability scanningWinHttp collectioncc attackEasouSpider useless crawlerHttpClient tcp attackMicrosoft URL Control scanningYYSpider useless crawlerjaunty wordpress blasting scanneroBot useless crawlerPython-urllib content collectionIndy Library scanningFlightDeckReports Bot useless crawlerLinguee Bot useless crawler

The above is the details of Nginx anti-crawler strategy to prevent UA from crawling the website. For more information about Nginx anti-crawler, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • SpringBoot+webMagic implements website crawler example code
  • Springboot+webmagic implements java crawler jdbc and mysql method
  • Python crawler crawls Taobao product price comparison (with Taobao anti-crawler mechanism solution)
  • Summary of methods to bypass anti-crawler in Python
  • Detailed explanation of how to deal with Python anti-crawler knowledge points with cookies
  • Detailed explanation of 4 ways to bypass anti-crawler mechanisms in Selenium-webdriver
  • Website Anti-Crawler Strategy
  • Python common anti-crawler strategies
  • Selenium anti-crawler to skip the Taobao slider verification function implementation code
  • Python crawler and anti-crawler war
  • Python anti-crawler disguises the browser to crawl
  • How to use springboot anti-crawler component kk-anti-reptile

<<:  Web interview: The difference between MVC and MVVM and why Vue does not fully comply with MVVM

>>:  How to optimize MySQL index function based on Explain keyword

Recommend

MySQL 8.0.12 Installation and Configuration Tutorial

This article records the detailed tutorial for in...

Detailed steps to install web server using Apache httpd2.4.37 on centos8

Step 1: yum install httpd -y #Install httpd servi...

Solution to Navicat Premier remote connection to MySQL error 10038

Remote connection to MySQL fails, there may be th...

CSS multi-level menu implementation code

This is a pretty cool feature that makes web page...

Example of exporting and importing Docker containers

Table of contents Exporting Docker containers Imp...

Nginx/Httpd load balancing tomcat configuration tutorial

In the previous blog, we talked about using Nginx...

How to get datetime data in mysql, followed by .0

The data type of MySQL is datetime. The data stor...

JavaScript history object explained

Table of contents 1. Route navigation 2. History ...

Solutions to MySQL batch insert and unique index problems

MySQL batch insert problem When developing a proj...

Common styles of CSS animation effects animation

animation Define an animation: /*Set a keyframe t...

Implementation of crawler Scrapy image created by dockerfile based on alpine

1. Download the alpine image [root@DockerBrian ~]...

The complete usage of setup, ref, and reactive in Vue3 combination API

1. Getting started with setUp Briefly introduce t...

How to quickly insert 10 million records into MySQL

I heard that there is an interview question: How ...

How to implement DIV's blur function

Use anti-shake to make DIV disappear when the mou...

Code analysis of user variables in mysql query statements

In the previous article, we introduced the MySQL ...