您的位置:首页 > 运维架构

HADOOP 分布式集群环境下第一个mapReduce程序—WordCount

2014-01-21 11:21 253 查看
关于hadoop 分布式集群环境搭建,本人已经在博客中的 ubuntu hadoop 全分布式集群搭建中介绍清楚了。具体详见(/article/9491961.html

一、linux 环境下SpringTooSuit安装

首先进入网站http://eclipse.org/downloads/?osType=linux下载SpringToolSuit,然后解压相关的下载包(本文下载的是3.4.0
tar.gz版本),然后进入解压后的文件夹,进入sts-3.4.0.RELEASE文件夹下,点击STS,打开STS.



二、linux 环境下编译

1.0之后hadoop已经不自带eclipse的插件包了,所以得需要我们自己编译源码生成插件包,建议在Linux下编译,Centos6.4的版本,hadoop1.2.0的版本, hadoop的目录在/root/gy/hadoop-1.2.1下面 ,STS的目录在/root/gy/springsource下面
,总结一下如下的四步来完成编译eclipse插件的过程 :

1)配置build.xml文件

进入/root/hadoop-1.2.0/src/contrib/eclipse-plugin下面,修改build.xml。设定eclipse的根目录、hadoop的版本号、hadoop的一些引用包以及在javac里加入 includeantruntime="on".

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License.  You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0 
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

<!-- build.xml -->
<project default="jar" name="eclipse-plugin">

<import file="../build-contrib.xml"/>

<property name="hadoop.dir" value="/root/gy/hadoop-1.2.1">
<property name="eclipse.home" location="/root/gy/springsource"/>
<property name="version" value="1.2.1"/>

<path id="eclipse-sdk-jars">
<fileset dir="${eclipse.home}/plugins/">
<include name="org.eclipse.ui*.jar"/>
<include name="org.eclipse.jdt*.jar"/>
<include name="org.eclipse.core*.jar"/>
<include name="org.eclipse.equinox*.jar"/>
<include name="org.eclipse.debug*.jar"/>
<include name="org.eclipse.osgi*.jar"/>
<include name="org.eclipse.swt*.jar"/>
<include name="org.eclipse.jface*.jar"/>

<include name="org.eclipse.team.cvs.ssh2*.jar"/>
<include name="com.jcraft.jsch*.jar"/>
</fileset>
</path>

<!-- Override classpath to include Eclipse SDK jars -->
<path id="classpath">
<pathelement location="${build.classes}"/>
<pathelement location="${hadoop.root}/build/classes"/>
<fileset dir="${hadoop.root}">
<include name="**/*.jar" />
</fileset>
<path refid="eclipse-sdk-jars"/>
</path>

<!-- Skip building if eclipse.home is unset. -->
<target name="check-contrib" unless="eclipse.home">
<property name="skip.contrib" value="yes"/>
<echo message="eclipse.home unset: skipping eclipse plugin"/>
</target>

<target name="compile" depends="init, ivy-retrieve-common" unless="skip.contrib">
<echo message="contrib: ${name}"/>
<javac
encoding="${build.encoding}"
srcdir="${src.dir}"
includes="**/*.java"
destdir="${build.classes}"
debug="${javac.debug}"
deprecation="${javac.deprecation}"
includeantruntime="on">
<classpath refid="classpath"/>
</javac>
</target>

<!-- Override jar target to specify manifest -->
<target name="jar" depends="compile" unless="skip.contrib">
<mkdir dir="${build.dir}/lib"/>

<copy file="${hadoop.root}/hadoop-core-${version}.jar" tofile="${build.dir}/lib/hadoop-core.jar" verbose="true"/>
<copy file="${hadoop.root}/lib/commons-cli-${commons-cli.version}.jar"  tofile="${build.dir}/lib/commons-cli.jar" verbose="true"/>
<copy file="${hadoop.root}/lib/commons-configuration-1.6.jar"  tofile="${build.dir}/lib/commons-configuration.jar" verbose="true"/>
<copy file="${hadoop.root}/lib/commons-httpclient-3.0.1.jar"  tofile="${build.dir}/lib/commons-httpclient.jar" verbose="true"/>
<copy file="${hadoop.root}/lib/commons-lang-2.4.jar"  tofile="${build.dir}/lib/commons-lang.jar" verbose="true"/>
<copy file="${hadoop.root}/lib/jackson-core-asl-1.8.8.jar"  tofile="${build.dir}/lib/jackson-core-asl.jar" verbose="true"/>
<copy file="${hadoop.root}/lib/jackson-mapper-asl-1.8.8.jar"  tofile="${build.dir}/lib/jackson-mapper-asl.jar" verbose="true"/>

<jar
jarfile="${build.dir}/hadoop-${name}-${version}.jar"
manifest="${root}/META-INF/MANIFEST.MF">
<fileset dir="${build.dir}" includes="classes/ lib/"/>
<fileset dir="${root}" includes="resources/ plugin.xml"/>
</jar>
</target>

</project>


2. 修改build-contrib.xml

cd /hadoop-1.2.1/src/contrib
vi build-contrib.xml
<property name="hadoop.root" location="/root/gy/hadoop-1.2.1"/>
<property name="eclipse.home" location="/root/gy/springsource" />
<property name="javac.deprecation" value="on"/>


3 修改MANIFEST.MF

Bundle-ClassPath: classes/,lib/commons-cli.jar,lib/commons-httpclient.jar,lib/hadoop-core.jar,lib/jackson-mapper-asl.jar,lib/commons-configuration.jar,lib/commons-lang.jar,lib/jackson-core-asl.jar


4 使用shell命令进入/root/hadoop-1.2.0/src/contrib/eclipse-plugin下面,执行ant命令进行构建

三、在eclipse当中构建hadoop项目

1.将ant生成的hadoop-1.2.1-eclipse-plugin.jar复制到 eclipse安装目录/plugins/ 下。

2. 重启eclipse,配置hadoop installation directory。

如果安装插件成功,打开Window-->Preferens,你会发现Hadoop Map/Reduce选项,在这个选项里你需要配置Hadoop installation directory。配置完成后退出。



3. 配置Map/Reduce Locations。

在Window-->Show View中打开Map/Reduce Locations。

在Map/Reduce Locations中新建一个Hadoop Location。在这个View中,右键-->New Hadoop Location。在弹出的对话框中你需要配置Location name,如Hadoop,还有Map/Reduce Master和DFS Master。这里面的Host、Port分别为你在mapred-site.xml、core-site.xml中配置的地址及端口。



4. 新建项目

File-->New-->Other-->Map/Reduce Project 项目名可以随便取,如WordCount。



5. WordCount源码

package com.test.word;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

public class WordCount {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map(LongWritable arg0, Text value,
OutputCollector<Text, IntWritable> output, Reporter arg3)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

@Override
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter arg3)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJarByClass(com.test.word.WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path("/root/gy/testData/input.txt"));
FileOutputFormat.setOutputPath(conf, new Path("/root/gy/testData/output"));
JobClient.runJob(conf);

}
}
6 最终运行成功,后台信息。



这个是我ant后生成hadoop-eclipse-plugin jar包,可直接使用。这个jar的地址是: http://download.csdn.net/detail/luoluowushengmimi/6869717
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: