您的位置:首页 > 编程语言 > Java开发

Eclipse打包mapreduce程序并提交至hadoop集群运行

2014-11-07 09:20 519 查看
在命令行里能够将程序运行在hadoop集群环境后,将Eclipse里的各种配置也相应配好,点击run on hadoop。

作业成功运行,hdfs上能够看到结果,但是仍然,没有提交至真正的集群环境。

查了好久资料,直接在代码中指定远程jobtracker地址,仍然未果。

于是在Eclipse里调试程序,运行成功后打成jar包上传至hadoop集群中运行:

直接export,保证jar文件的META-INF/MANIFEST.MF文件中存在Main-Class映射:

Main-Class: WordCount

其实直接next自动文件里就有这个关系。

将打好的jar上传至服务器,假设在/opt目录下,则命令:

hadoop jar /opt/myWordCount.jar WordCount /test_in /output12

报错:

xception in thread "main" java.lang.UnsupportedClassVersionError: WordCount : Unsupported major.minor version 52.0

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:270)

at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

网上查资料,怀疑是java版本不同导致,win7上的Eclipse是java1.8.而服务器上的是java1.7

在Eclipse里面 windows--preference--java--compile--compile level,选择1.7

重新导入运行

出现错误:

14/11/07 10:33:46 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

14/11/07 10:33:47 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

14/11/07 10:33:48 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

14/11/07 10:33:49 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

14/11/07 10:33:50 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

14/11/07 10:33:51 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

14/11/07 10:33:52 INFO ipc.Client: Retrying connect to server: hadoop-05/192.168.0.7:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

resourcemanager连不上。检查yarn-site.xml都配置好了

但是发现端口号与默认的端口号不一致,于是修改

配置文件改为如下:

<property>

<name>yarn.resourcemanager.address</name>

<value>localhost:8032</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>localhost:8030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>localhost:8031</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.resourcemanager.hostname</name>

<value>192.168.0.7</value>

</property>

重新运行,仍然出现同样错误,于是将代码中显式指定的job.tracker注释掉。

竟然又出现错误:

Usage: wordcount <in> <out>

检查代码,发现这是因为输入参数不是两个而导致。但是检查了命令没有发现错误,只能将路径写死在程序中,再打jar包

FileInputFormat.addInputPath(job, new Path("hdfs://192.168.0.7:9000/test_in"));

FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.0.7:9000/out1"));

提交至hadoop集群,结果出来了。

但是还是没有想通为什么路径写在外面不可以。先记录 mark下
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐