建站程序:zsky,可以用来自建磁力搜索网站

2723 人围观资源分享

网址:https://github.com/wenguonideshou/zsky

说明文档:

说明:
在ssbc爬虫的基础上修复,现在可以7*24爬取的爬虫,修改了爬取策略,只入库音乐、电影、电子书。python实现的磁力搜索网站,代码比较烂,请轻喷!
搜索排行榜、浏览排行榜、DMCA投诉的功能未完成(其实是不想做)
和ssbc相比,没使用sphinx进行索引,而是用redis缓存访问页面,使用jieba分词,比sphinx的中文分词效果好。
模板在templates目录,模板引擎是jinja2(非常易读),编写自己的专属模板非常方便,中文版文档 http://docs.jinkan.org/docs/jinja2/ 。
后台可以直接搜索、删除DMCA投诉的关键字,管理首页推荐关键字、用户搜索记录、查看每天爬取的资源数量、管理后台用户。
修改数据库密码后请修改manage.py里面的mysql+pymysql://root:后面的内容和simdht_work.py里面的DB_PASS
实验环境:centos7 python2.7
理论上支持python3,以及Ubuntu Debian等系统

安装:
文件上传到主机上
tar zxvf zsky.tar.gz
systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl stop iptables.service
systemctl disable iptables.service
setenforce 0
sed -i s/”SELINUX=enforcing”/”SELINUX=disabled”/g /etc/sysconfig/selinux
#关闭selinux
cat << EOF > /etc/sysctl.conf
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_synack_retries = 1
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl =15
net.ipv4.tcp_retries2 = 5
net.ipv4.tcp_fin_timeout = 2
net.ipv4.tcp_max_tw_buckets = 36000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 32768
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_wmem = 8192 131072 16777216
net.ipv4.tcp_rmem = 32768 131072 16777216
net.ipv4.tcp_mem = 786432 1048576 1572864
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.ip_conntrack_max = 65536
net.ipv4.netfilter.ip_conntrack_max=65536
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established=180
net.core.somaxconn = 16384
net.core.netdev_max_backlog = 16384
EOF
/sbin/sysctl -p /etc/sysctl.conf
/sbin/sysctl -w net.ipv4.route.flush=1
echo ulimit -HSn 65536 >> /etc/rc.local
echo ulimit -HSn 65536 >>/root/.bash_profile
ulimit -HSn 65536
#优化内核参数,优化打开文件数
cd zsky
yum -y install wget gcc gcc-c++ python-devel mariadb mariadb-devel mariadb-server
yum -y install epel-release python-pip redis
pip install -r requirements.txt
pip install redis
systemctl start mariadb.service
systemctl enable mariadb.service
systemctl start redis.service
systemctl enable redis.service
mysql -uroot -e”create database zsky default character set utf8mb4;”
#utf8mb4比utf8更优秀
mysql -uroot -e”set global interactive_timeout=31536000;set global wait_timeout=31536000;set global max_allowed_packet = 64*1024*1024;set global max_connections = 10000;”
#建立并优化mysql数据库参数
python manage.py init_db
#建表
python manage.py create_user
#按照提示输入用户名、密码、邮箱
nohup gunicorn -k gevent –access-logfile zsky.log –error-logfile zsky_err.log manage:app -b 0.0.0.0:80 –reload>/dev/zero 2>&1&
#开启网站访问,访问日志是当前目录下zsky.log,错误日志是当前目录下zsky_err.log
#如果不想要日志 就运行下面这条命令
#nohup gunicorn -k gevent manage:app -b 0.0.0.0:80 –reload>/dev/zero 2>&1&
nohup python simdht_worker.py 2>&1&
#开启爬虫并写日志,如果爬虫有问题请提交日志文件nohup.out给我

现在应该能访问http://IP 了,解析域名即可完成部署
后台地址http://IP/admin

#开机自启动
chmod +x /etc/rc.d/rc.local
echo “systemctl start mariadb.service” >> /etc/rc.d/rc.local
echo “systemctl start redis.service” >> /etc/rc.d/rc.local
echo “cd /root/zsky” >> /etc/rc.d/rc.local
echo “nohup python simdht_worker.py >/dev/zero 2>&1&” >> /etc/rc.d/rc.local
echo “nohup gunicorn -k gevent manage:app -b 0.0.0.0:80 –reload>/dev/zero 2>&1&” >> /etc/rc.d/rc.local

1 个回答

..这个爬虫没限制抓取时候连接数。。我挂服务器上直接连接数过多被暂停了…

You are viewing 1 out of 1 answers, click here to view all answers.

Login

欢迎!请登录你的账号。

记住我 忘记密码?

还未注册 注册

Lost Password

Register

返回顶部