您的位置:首页 > 运维架构 > Linux

Linux下安装pyspider 系统版本为centos7 【总结版】

2015-10-01 20:01 435 查看
国庆节的现在重新租了个阿里云服务器,不得不装个pyspider用于爬虫,但是安装却没那么顺利了。这里把安装过程记录一下,以及一些error 的解决方法。

【1】首先确保系统里面装了pip ,没有的话可以自己百度详细信息,这里只贴出我安装时的指令:

wget https://pypi.python.org/packages/source/p/pip/pip-7.1.2.tar.gz#md5=3823d2343d9f3aaab21cf9c917710196 tar -xvf pip-7.1.2.tar.gz
cd pip-7.1.2
python setup.py install


【2】安装好了后就可以直接安装pyspider了。输入指令: pip install pyspider

结果报错!下面分别对遇到的每个报错信息做记录:

(1)错误一,pip 的使用有问题,以及安装flask出错。如下:

[root@iZ28jyxu47dZ fancy]# pip install pyspider

Collecting pyspider

/usr/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL
connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning

Downloading pyspider-0.3.5.tar.gz (94kB)

100% |████████████████████████████████| 98kB 41kB/s

Collecting Flask>=0.10 (from pyspider)

Downloading Flask-0.10.1.tar.gz (544kB)

15% |████▉ | 81kB 250bytes/s eta 0:30:44

Hash of the package https://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz#md5=378670fe456957eb3c27ddaef60b2b24 (fromhttps://pypi.python.org/simple/flask/) (e11c5569eb68d582ce1c85154b9b48c9)
doesn't match the expected hash 378670fe456957eb3c27ddaef60b2b24!

Bad md5 hash for package https://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz#md5=378670fe456957eb3c27ddaef60b2b24 (fromhttps://pypi.python.org/simple/flask/)

出错原因是urllib3的ssl连接失败。解决办法是安装需要的依赖库什么的,

参考网址:/article/1763120.html
https://www.phodal.com/blog/python-pip-openssl-issue/
相关指令:

yum install python-devel libffi-devel openssl-devel
pip install pyopenssl ndg-httpsclient pyasn1


(注意,Ubuntu系统不能用yum,应该换成apt-get)

安装完了以后还是不能直接通过pip install pyspider 。因为上面这一步只是解决了pip使用时出现 InsecurePlatformWarning 的报错信息。

而flask还是不能装上的,这个时候就只能通过自己手动装上flask了。当然,有走了弯路,去搜索bad md5 hash for package。这里就不贴了。

参考网址:http://www.169it.com/tech-python/article-539019800.html 在安装了相关的程序以后通过:

easy_install flask


就成功装上了flask。这也说明,通过pip install flask 时出现错误,重新安装时只会从缓冲里面读取,哪怕是装好了相关依赖还是安装不成功,这个时候通过easy_install去安装也许是一个不错的方法。

【3】再次运行 pip install pyspider .

一切都很顺利,直到安装lxml时出错。这里我把出错的几个关键信息贴上来:

#信息一#:

Installing collected packages: chardet, cssselect, lxml, pyquery, requests, certifi, tornado, Flask-Login, u-msgpack-python, click, pyspider

Found existing installation: chardet 2.0.1

DEPRECATION: Uninstalling a distutils installed project (chardet) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.

Uninstalling chardet-2.0.1:

Successfully uninstalled chardet-2.0.1

Running setup.py install for chardet

Running setup.py install for cssselect

Running setup.py install for lxml

Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-dtraef/lxml/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-5Tyn0R-record/install-record.txt
--single-version-externally-managed --compile:

/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'

warnings.warn(msg)

Building lxml version 3.4.4.

Building without Cython.

ERROR: /bin/sh: xslt-config: command not found

** make sure the development packages of libxml2 and libxslt are installed **

#信息二#:

gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/tmp/pip-build-dtraef/lxml/src/lxml/includes -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o
-w

In file included from src/lxml/lxml.etree.c:239:0:

/tmp/pip-build-dtraef/lxml/src/lxml/includes/etree_defs.h:14:31: fatal error: libxml/xmlversion.h: No such file or directory

#include "libxml/xmlversion.h"

^

compilation terminated.

error: command 'gcc' failed with exit status 1

----------------------------------------

Command "/usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-dtraef/lxml/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-5Tyn0R-record/install-record.txt
--single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-dtraef/lxml

从信息一可以看出:已经下载好了所有pyspider依赖加入了安装阶段,并且chardet、cssselect也安装成功了,是lxml安装出错。从报错信息看,应该是libxml2和libxslt没有装好。从信息二看,也可能是 gcc 除了问题。我是先从信息二入手:

参考网址:/article/4398003.html

所以通过指令yum install python-dev gcc把python-dev和gcc重新安装了一下。通过pip install lxml 发现还是出现这样的信息,这就说明出错一定是在信息一了。(不得不佩服能有这样的安装日记可以查阅啊,不然真的不知道哪里出错了!!)

参考网址:http://stackoverflow.com/questions/5178416/pip-install-lxml-error

/article/1657615.html

输入指令:

yum install libxslt-devel libxml2-devel


然后在输入:

pip install lxml


发现安装成功了!

【4】到这里,再输入 pip install pyspider 终于安装成功了!!!尽情开启你的爬虫之路吧!

如果你想看我的安装过程的详细信息,可以看我的这篇博文:

Linux下安装pyspider的详细过程和相关指令【无总结版】:

【总结】

1. 特别留意安装过程中的相关信息,那可以排除bug的线索啊

2. 最好搞清楚原理和每条指令的含义,不然,有时候会为自己的系统装上一大堆没有什么用的东西

3. 其实可以通过搜索指令来查找报错信息,这样貌似更高效、更有针对性
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: