您的位置：首页 > 编程语言 > Java开发

windows eclipse hadoop 集群开发环境搭建（分布式模式）

2015-12-23 19:46 1061 查看

一、概述
最近开始着手高校云平台的搭建，前些天做了hadoop集群测试环境的安装与配置的经验分享，这篇文章主要介绍win7 64位下 eclipse4.2 连接远程Redhat Linux 5下hadoop-1.2.0集群开发环境搭建

二、环境
1、window 7 64位
2、eclipse 4.2
3、Redhat Linux 5
4、hadoop-1.2.0

三、安装配置hadoop集群
参考我的文章：
/article/1414410.html
http://www.jialinblog.com/?p=176

四、在Eclipse下安装配置hadoop插件
1、编译Eclipse-hadoop插件
参考：/article/4693665.html

2、安装
安装插件就很简单了，把上面编译的插件文件放到 Eclipse的安装目录下的plugins，重新启动Eclipse

3、配置
（1）将hadoop解压到windows文件系统的某个目录中
（2）打开Eclipse，设置好workspace

打开Window-->Preferens，你会发现Hadoop Map/Reduce选项，在这个选项里你需要配置Hadoop installation directory。配置完成后退出。

（3）选择window -> open perspective -> Other... ，选择有大象图标的 Map/Reduce，此时，就打开了Map/Reduce的开发环境。可以看到，右下角多了一个Map/Reduce Locations的框。如下图

新建，在打开的窗口中输入：

Location Name ：此处为参数设置名称，可以任意填写

Map/Reduce Master (此处为Hadoop集群的Map/Reduce地址，应该和mapred-site.xml中的mapred.job.tracker设置相同)

DFS Master (此处为Hadoop的master服务器地址，应该和core-site.xml中的 fs.default.name 设置相同)

设置完成后，点击Finish就应用了该设置。

此时，在最左边的Project Explorer中就能看到DFS的目录，如下图所示。

配置完毕

五、测试
新建项目：File-->New-->Other-->Map/Reduce Project ,项目名可以随便取，如hadoop_test_01

它会自动添加依赖包，如下：

可以运行hadoop自带的wordcount实例

/**
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0 *
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.jialin.hadoop;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();

String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

运行时参数设置：

右击wordcount，选择run as - run configurations

参数根据自己实际情况

input目录下有两个文件input1和input2，内容分别为：hello world，hello hadoop
output目录不用手动创建。

运行：
右击wordcount-run as -run on hadoop

运行成功，查看output中的文件内容
hello 2
hadoop 1
world 1

注：测试中遇到问题的解决方式

解决权限问题
1、hadoop权限

如果当前登录windows的用户名和hadoop集群的用户名不一致，将没有权限访问，会报错

目前做法是开发时将hadoop服务集群关闭权限认证，正式发布时，可以在服务器创建一个和hadoop集群用户名一致的用户，即可不用修改master的permissions策略。

详细参考我的文章：
/article/1414411.html

http://www.jialinblog.com/?p=172

2、windows下0700问题
这个问题真是纠结了我好几天，最后修还hadoop源码hadoop-core-1.2.0.jar中的FileUtil，重新编译 hadoop-core-1.2.0.jar ，替换掉原来的。才得以解决

详细参考我的文章：
/article/1414412.html
http://www.jialinblog.com/?p=174

七、总结

至此高校云平台的hadoop集群基本开发环境已经出来了，剩下的就是在此基础上进行丰富了。如果是简单的测试，推荐使用单机hadoop方式，或者伪分布式。我之所以不选择单机或伪分布式，只是想尽可能地模拟真实环境。大家按需选择吧。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航