您的位置:首页 > 编程语言 > Java开发

Hadoop 用Eclipse来Mapreduce WordCount实战(1)

2017-03-03 11:58 281 查看
(一)官网下载http://www.eclipse.org/

 










(二)maven http://www.mvnrepository.com

   


选择对应的hadoop版本





拷贝对应的hadoop

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.4.1</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.4.1</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.4.1</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.4.1</version>
</dependency>

(三)解压下载Eclipse

    (3.1) 右键工程|Build Path|Configure Build Path
     


   

  
   (3.2) 安装Hadoop插件
  


   (a)关闭Eclipse; 将这个jar放入到D:\eclipse-jee-mars-2-win32\eclipse\plugins目录下
    
   (b)启动Eclipse;Window|Preferences|  (注意:Browse...选择当前Hadoop的目录)
    


    (c)Window|Show View| Other
     


     (d)必须先启动linux中的Hadoop
          [hadoop@master-hadoop hadoop-2.4.1]$ sbin/start-dfs.sh
          [hadoop@master-hadoop hadoop-2.4.1]$ sbin/start-yarn.sh

     (e)单击右下角

小象;设置
New Hadoop Location....
        


     (f)显示效果
       


       (3.3) Hadoop中的bin中缺少编译文件
     


        (a) 将winutils.ext文件复制到C:\hadoop-2.4.1\hadoop-2.4.1\bin目录下
        (b)将Hadoop.dll 文件复制到 C:\Windows\System32目录下
 
    (3.4)编写源代码




  WordCountMapper类

package com.hlx.mapreduce.wc;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

/**
* 继承这个mapper LongWritable ==>long Text ===>String IntWritable==>int
*
* @author Administrator
*
*/
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

/**
* 重写这个方法
*/
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
// super.map(key, value, context);

// 1) 获得每一行的数据
// hello hadoop
String line = value.toString();

// 2)分割每一行的数据
//hello,hadoop
String[] splits = line.split(" ");

//3)遍历每一行的数据
//hello 1
//hadoop 1
for(String str :splits){
//context上下 文数据(key--value 每个单词输出1次)
context.write(new Text(str), new IntWritable(1));
}
}
}


WordCountReduce类

package com.hlx.mapreduce.wc;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
/**
* 继承Reduce
* Text ==>String
* IntWritable ==>int
* (输入(key,value),输出(key,value))
* @author Administrator
*
*/
public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
// a 1
// b 1
// c 1
// hello{1,1,1}==> hello{3}  ===>其实就是values
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int count=0; //累计和
//遍历数据
for(IntWritable value :values){
count +=value.get();
}

//写入到上下文
context.write(key, new IntWritable(count));

}
}
 

WordCountMapReduce类
package com.hlx.mapreduce.wc;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
* 测试类
*
* @author Administrator
*
*/
public class WordCountMapReduce {

public static void main(String[] args) throws Exception {
// 创建配置对象
Configuration conf = new Configuration();

// 创建job对象
Job job = Job.getInstance(conf, "wordcount0");

//设置运行的主类
job.setJarByClass(WordCountMapReduce.class);

//设置map类
job.setMapperClass(WordCountMapper.class);

//设置reduce类
job.setReducerClass(WordCountReduce.class);

//设置map(key,value)
job.setMapOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

//设置reduce(key,value)
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

//设置输入 输出路径 words=是输入文件夹中有个words文件; out3=是输出文件夹
FileInputFormat.setInputPaths(job, new Path("hdfs://master-hadoop.dragon.org:9000/words"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://master-hadoop.dragon.org:9000/out3"));

//提交job
boolean flag= job.waitForCompletion(true);
if(!flag){
System.out.println("the task has failed!");
}
}
}

    (3.5)运行



效果如下:



注意:其实源代码可以优化的!

  
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐