nutch爬取时Exception in thread “main” java.io.IOException: Job failed!
2013-09-18 14:15
537 查看
用cygwin运行nutch 1.2爬取提示IOException:
[plain] view plaincopy
$ bin/nutch crawl urls -dir crawl -depth 3 -topN 10
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
indexer=lucene
topN = 10
Injector: starting at 2011-10-10 15:19:26
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:143)
网上提供的解决方案都是换成0.9的,经过多次搜索,终于找到了问题的原因:语言设置问题。解决方案如下:
方法一:
重新安装Cygwin,换一个国外的映像,(163的映像是汉化版的)
方法二:
Cygwin Shell里边直接set LANG=en_US是没作用的,好像cygwin shell不支持set命令,而Cygwin也没有locale命令。
于是到”系统属性>高级>环境变量”中增加一个环境变量名为LANG,将值设为en_US
方法三:
让cygwin shell使用英文界面只用在~/.bashrc这个文件里面加上一句话:
如果想使用其他界面,把en_US改成其他语言对应的地区代码就行了,比如说
在这里我们要让界面是英文同时能让中文也能显示,所以将LANG设置成
到此界面语言修改完成,简单吧?事情没那么简单,在cygwin下面,只做这样的设置,会让vi等软件出现匪夷所思的现象,难道是……打开方式不对?No,其实语言的设置还没完。执行locale命令,会返回给你目前所有语言设置,可以说有关语言设置的项目还是挺丰富的嘛,LC下还有许多要设置的,不过LC下面有一个LC_ALL,看样子设置他就行了,.bashrc加上一句:
可以在修改前后用df等命令查看下有无中文
参考:http://owwlo.com/blog/?p=36#comment-38
http://blog.csdn.net/a221133/article/details/7043318
[plain] view plaincopy
$ bin/nutch crawl urls -dir crawl -depth 3 -topN 10
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
indexer=lucene
topN = 10
Injector: starting at 2011-10-10 15:19:26
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:143)
网上提供的解决方案都是换成0.9的,经过多次搜索,终于找到了问题的原因:语言设置问题。解决方案如下:
方法一:
重新安装Cygwin,换一个国外的映像,(163的映像是汉化版的)
方法二:
Cygwin Shell里边直接set LANG=en_US是没作用的,好像cygwin shell不支持set命令,而Cygwin也没有locale命令。
于是到”系统属性>高级>环境变量”中增加一个环境变量名为LANG,将值设为en_US
方法三:
让cygwin shell使用英文界面只用在~/.bashrc这个文件里面加上一句话:
export LANG='en_US'
如果想使用其他界面,把en_US改成其他语言对应的地区代码就行了,比如说
zh_CN。其实LANG的“完全体”是“地区.编码”这样的结构。所以如果要想自己设置编码,还需要在zh_CN的后面加上编码——这里用GBK来说明——使其变成
zh_CN.GBK。
在这里我们要让界面是英文同时能让中文也能显示,所以将LANG设置成
en_US.GBK。
到此界面语言修改完成,简单吧?事情没那么简单,在cygwin下面,只做这样的设置,会让vi等软件出现匪夷所思的现象,难道是……打开方式不对?No,其实语言的设置还没完。执行locale命令,会返回给你目前所有语言设置,可以说有关语言设置的项目还是挺丰富的嘛,LC下还有许多要设置的,不过LC下面有一个LC_ALL,看样子设置他就行了,.bashrc加上一句:
export LC_ALL='en_US.GBK'
可以在修改前后用df等命令查看下有无中文
参考:http://owwlo.com/blog/?p=36#comment-38
http://blog.csdn.net/a221133/article/details/7043318
相关文章推荐
- nutch-1.2爬取时Exception in thread “main” java.io.IOException: Job failed!
- nutch1.2 Exception in thread "main" java.io.IOException: Job failed!
- nutch 报Exception in thread "main"java.io.IOException: Job failed!
- Exception in thread "main" java.io.IOException: Job failed! 已解决
- Exception in thread "main" java.io.IOException: Job failed!
- [Nutch]问题解决:Exception in thread "main" java.io.IOException: Failed to set permissions of path
- [Nutch]问题解决:Exception in thread "main" java.io.IOException: Job failed
- Exception in thread "main" java.io.IOException: Error opening job jar: /apache_logs/cleaned.jar
- Exception in thread "main" java.io.IOException: Error opening job jar: /apache_logs/cleaned.jar
- Exception in thread "main" java.io.IOException: Error opening job jar: /apache_logs/cleaned.jar
- Exception in thread "main" java.io.IOException: Error opening job jar: /apache_logs/cleaned.jar
- windows下eclipse远程连接hadoop错误“Exception in thread"main"java.io.IOException: Call to Master.Hadoop/172.20.145.22:9000 failed ”
- Exception in thread "main" java.io.IOException: Error opening job jar: /apache_logs/cleaned.jar
- Exception in thread "main" java.io.IOException: Error opening job jar: /apache_logs/cleaned.jar
- Exception in thread "main" java.io.IOException: Error opening job jar: /apache_logs/cleaned.jar
- Exception in thread "main" java.io.IOException: Error opening job jar: /apache_logs/cleaned.jar
- Exception in thread "main" java.io.IOException: Failed to set permissions of path
- Exception in thread "main" java.io.IOException: Error opening job jar: /apache_logs/cleaned.jar
- Exception in thread "main" java.io.IOException: Error opening job jar: /apache_logs/cleaned.jar
- Exception in thread "main" java.io.IOException: Error opening job jar: /apache_logs/cleaned.jar