Implementation of crawler Scrapy image created by dockerfile based on alpine

Implementation of crawler Scrapy image created by dockerfile based on alpine

1. Download the alpine image

[root@DockerBrian ~]# docker pull alpine
Using default tag: latest
Trying to pull repository docker.io/library/alpine ...
latest: Pulling from docker.io/library/alpine
4fe2ade4980c: Pull complete
Digest: sha256:621c2f39f8133acb8e64023a94dbdf0d5ca81896102b9e57c0dc184cadaf5528
Status: Downloaded newer image for docker.io/alpine:latest
[root@docker43 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/alpine-latest 196d12cf6ab1 3 weeks ago 4.41 MB

2. Write Dockerfile

Create a scrapy directory to store the dockerfile file

[root@DockerBrian ~]# mkdir /opt/alpineDockerfile/
[root@DockerBrian ~]# cd /opt/alpineDockerfile/
[root@DockerBrian alpineDockerfile]# mkdir scrapy && cd scrapy && touch Dockerfile
[root@DockerBrian alpineDockerfile]# cd scrapy/
[root@DockerBrian scrapy]# ll
Total dosage 4
-rw-r--r-- 1 root root 1394 Oct 10 11:36 Dockerfile

Writing a Dockerfile

# Specify the created base image FROM alpine
 
# Author description information MAINTAINER alpine_python3_scrapy ([email protected])
 
# Replace the Alibaba Cloud source RUN echo "http://mirrors.aliyun.com/alpine/latest-stable/main/" > /etc/apk/repositories && \
  echo "http://mirrors.aliyun.com/alpine/latest-stable/community/" >> /etc/apk/repositories
 
# Synchronize time # Update the source, install openssh, modify the configuration file, generate the key and synchronize the time RUN apk update && \
  apk add --no-cache openssh-server tzdata && \
  cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
  sed -i "s/#PermitRootLogin.*/PermitRootLogin yes/g" /etc/ssh/sshd_config && \
  ssh-keygen -t rsa -P "" -f /etc/ssh/ssh_host_rsa_key && \
  ssh-keygen -t ecdsa -P "" -f /etc/ssh/ssh_host_ecdsa_key && \
  ssh-keygen -t ed25519 -P "" -f /etc/ssh/ssh_host_ed25519_key && \
  echo "root:h056zHJLg85oW5xh7VtSa" | chpasswd
 
# Install Scrapy dependency packages (required dependencies)
RUN apk add --no-cache python3 python3-dev gcc openssl-dev openssl libressl libc-dev linux-headers libffi-dev libxml2-dev libxml2 libxslt-dev openssh-client openssh-sftp-server
 
# The installation environment requires pip packages (packages here can be added or deleted as needed)
RUN pip3 install --default-timeout=100 --no-cache-dir --upgrade pip setuptools pymysql pymongo redis scrapy-redis ipython Scrapy requests
 
# Start the ssh script RUN echo "/usr/sbin/sshd -D" >> /etc/start.sh && \
  chmod +x /etc/start.sh
 
# Open port 22 EXPOSE 22
 
# Execute the ssh startup command CMD ["/bin/sh","/etc/start.sh"] 

The container can remotely access Scrapy installed in Python3 environment through SSH, and start the SSH service through the start.sh script

3. Create an image

Create an image

[root@DockerBrian scrapy]# docker build -t scrapy_redis_ssh:v1 . 

View Mirror

[root@DockerBrian scrapy]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
scrapy_redis_ssh v1 b2c95ef95fb9 4 hours ago 282 MB
docker.io/alpine-latest 196d12cf6ab1 4 weeks ago 4.41 MB

4. Create a container

Create a container (named scrapy10086, the remote port is mapped to the host port 10086)

Copy the code as follows:
docker run -itd --restart=always --name scrapy10086 -p 10086:22 scrapy_redis_ssh:v1

View Container

[root@DockerBrian scrapy]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7fb9e69d79f5 b2c95ef95fb9 "/bin/sh /etc/star..." 3 hours ago Up 3 hours 0.0.0.0:10086->22/tcp scrapy10086

Login to container

[root@DockerBrian scrapy]# ssh [email protected] -p 10086 
The authenticity of host '[127.0.0.1]:10086 ([127.0.0.1]:10086)' can't be established.
ECDSA key fingerprint is SHA256:wC46AU6SLjHyEfQWX6d6ht9MdpGKodeMOK6/cONcpxk.
ECDSA key fingerprint is MD5:6a:b7:31:3c:63:02:ca:74:5b:d9:68:42:08:be:22:fc.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[127.0.0.1]:10086' (ECDSA) to the list of known hosts.
[email protected]'s password: # The password here is defined in dockerfile echo "root:h056zHJLg85oW5xh7VtSa" | chpasswd
Welcome to Alpine!
 
The Alpine Wiki contains a large amount of how-to guides and general
information about administrating Alpine systems.
See <http://wiki.alpinelinux.org>.
 
You can setup the system with the command: setup-alpine
 
You may change this message by editing /etc/motd.
 
7363738cc96a:~#

5. Testing

Create a scrapy project test

7363738cc96a:~# scrapy startproject test
New Scrapy project 'test', using template directory '/usr/lib/python3.6/site-packages/scrapy/templates/project', created in:
  /root/test
 
You can start your first spider with:
  cd test
  scrapy genspider example example.com
7363738cc96a:~# cd test/
7363738cc96a:~/test# ls
scrapy.cfg test
7363738cc96a:~/test# cd test/
7363738cc96a:~/test/test# ls
__init__.py __pycache__ items.py middlewares.py pipelines.py settings.py spiders
7363738cc96a:~/test/test#

Test success

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in:
  • Alpine Docker image font problem solving operations
  • Implementation of tomcat image created with dockerfile based on alpine
  • How to build php-nginx-alpine image from scratch in Docker
  • Perfect solution to Docker Alpine image time zone problem

<<:  Comparison of mydumper and mysqldump in mysql

>>:  Some pitfalls of JavaScript deep copy

Recommend

Docker container data volume named mount and anonymous mount issues

Table of contents What is a container data volume...

VMware Workstation is not compatible with Device/Credential Guard

When installing a virtual machine, a prompt appea...

Example code for implementing a pure CSS pop-up menu using transform

Preface When making a top menu, you will be requi...

Summary of MySQL InnoDB locks

Table of contents 1. Shared and Exclusive Locks 2...

In-depth explanation of closure in JavaScript

Introduction Closure is a very powerful feature i...

Vue implements zip file download

This article example shares the specific code of ...

How to operate Linux file and folder permissions

Linux file permissions First, let's check the...

A Brief Analysis of MySQL Connections and Collections

Join query A join query refers to a matching quer...

Learn the black technology of union all usage in MySQL 5.7 in 5 minutes

Performance of union all in MySQL 5.6 Part 1:MySQ...

vue.js downloads pictures according to picture url

Recently, when I was working on a front-end vue.j...

Baidu Input Method opens API, claims it can be ported and used at will

The relevant person in charge of Baidu Input Metho...

Briefly describe the MySQL InnoDB storage engine

Preface: The storage engine is the core of the da...

Several ways of running in the background of Linux (summary)

1. nohup Run the program in a way that ignores th...

How to quickly add columns in MySQL 8.0

Preface: I heard a long time ago that MySQL 8.0 s...