您的位置:首页 > 运维架构

Hadoop 统计单词字数的例子

2012-04-21 18:06 423 查看
hadoop 的核心还是 Map-Reduce过程和 hadoop分布式文件系统

第一步:定义Map过程

/**
*
* Description:
*
* @author charles.wang
* @created Mar 12, 2012 1:41:57 PM
*
*/
public class MyMap extends Mapper<Object, Text, Text, IntWritable> {

private static final IntWritable one = new IntWritable(1);
private Text word;

public void map(Object key ,Text value,Context context)
throws IOException,InterruptedException{

String line=value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreTokens()){
word = new Text();
word.set(tokenizer.nextToken());
context.write(word, one);
}

}

}

第二步: 定义 Reduce 过程

/**
*
* Description:
*
* @author charles.wang
* @created Mar 12, 2012 1:48:18 PM
*
*/
public class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce (Text key,Iterable<IntWritable> values,Context context)
throws IOException ,InterruptedException{

int sum=0;
for(IntWritable val: values){
sum+=val.get();
}

context.write(key, new IntWritable(sum));
}

}


编写一个Driver 来执行Map-Reduce过程

public class MyDriver {

public static void main(String [] args) throws Exception{

Configuration conf = new Configuration();
conf.set("hadoop.job.ugi", "root,root123");

Job job = new Job(conf,"Hello,hadoop! ^_^");

job.setJarByClass(MyDriver.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setMapperClass(MyMap.class);
job.setCombinerClass(MyReduce.class);
job.setReducerClass(MyReduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));

job.waitForCompletion(true);
}
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Hadoop Map-Reduce