您的位置:首页 > 编程语言 > Python开发

How-to: resolve spark "/usr/bin/python: No module named pyspark" issue

2015-08-05 13:39 961 查看
[b]Error: [/b]

Error from python worker:

/usr/bin/python: No module named pyspark

PYTHONPATH was:

/home/hadoop/tmp/nm-local-dir/usercache/chenfangfang/filecache/43/spark-assembly-1.3.0-cdh5.4.1-hadoop2.6.0-cdh5.4.1.jar

java.io.EOFException

at java.io.DataInputStream.readInt(DataInputStream.java:392)

Root cause:
I am using 1.7.0_45. While python spark on yarn has some issue which makes pyspark does not work with spark build with jdk7: https://issues.apache.org/jira/browse/SPARK-1520.
There was not such issue with cdh 5.4.1 spark. But cdh 5.4.1 announced that it was using jdk 1.7.0_45, while its spark was build with jdk6.

Solution:
It is not reasonable for us to rebuild spark with jdk 6, as there are some issue during building. One available solution could be:

Regerate new package with following way:

unzip -d foo spark/lib/spark-assembly-1.3.0-cdh5.4.1-hadoop2.6.0-cdh5.4.1.jar

cd foo

$JAVA6_HOME/bin/jar cvmf META-INF/MANIFEST.MF ../spark/lib/spark-assembly-1.3.0-cdh5.4.1-hadoop2.6.0-cdh5.4.1.jar

don't neglect the dot at the end of that command
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息