您的位置：首页 > 大数据 > 云计算

人民大学云计算编程的网上评估平台--解题报告 1001-1003

2015-05-10 18:18 447 查看

这几天忙着找实习，所以日志耽搁了，现在来补起~~。

相信很多人都知道  PKU Online Judge， 现在中国人民大学也提供了一个类似的平台，但与北京在线评判系统不一样的是，中国人民大学的这个系统是专门评判mapreduce编程题的。

我把链接发出来，大家可以去试着做看看: http://cloudcomputing.ruc.edu.cn/index.jsp

大家在做题前，先看看“常见问题”根据系统要求的格式来写程序。不然不能正常运行。（我就是直接运行错了3次。 - -！）

可以看到这个平台的题目还不多，现在只有1000-1009，其中1008-1009的题目还没发出来。所以我们讨论1000-1007.

如果你想先自己测试下，下面的文章就可以先不忙看。等你解决其中的题，可以再来看这篇文章，大家可以共同提高。

1000 比较简单，用hadoop自带的例子都可以解决，我这里就不多说了。

1001 题目：
a+b per line

描述

有时候你会遇到这样的问题：你有一个表格，给出了每个人在十二月，一月和二月的收入。表格如下：

name  Dec   Jan($)

CM    200   314

LY    2000  332

QQM   6000  333

ZYM   5000  333

BP    30    12

你需要知道每个人这三个月的收入总和，那么你就需要将表格中一行代表收入的数字相加.下面请编写程序解决这个问题。

输入

输入只包含一个文件，文件中有一个表格，它的结构如下:

1 200   314

2 2000  332

3 6000  333

4 5000  333

5 30    12

其中每行最前面的数字是行标

输出

输出是一个文本文件，每一行第一个数字式行标，第二个数字是输入文件中每一行除行标外数字的和。如下:

1 514

2 2332

3 6333

4 5333

5 42

输入样例

input:

1 200   314

2 2000  332

3 6000  333

4 6000  333

5 5000  333

6 30    12

输出样例:

1 514

2 2332

3 6333

4 6333

5 5333

6 42

注意:

1 输入文件和输出文件都只有一个；

2 输入和输出文件每行的第一个数字都是行标；

3 每个数据都是正整数或者零.。

1001 解题思路：

1001的题目其实是很简单的，将读入的每一行用空格分隔，第一个域就是行号作为key、再将第二个域和第三个域相加作为value.

因为map阶段会根据key值自动排序，我们就不用操心了。至于key的排序顺序，我们以后讨论。

现在上代码：

[java] view
plaincopy

public class MyMapre {

public static  class wordcountMapper extends

Mapper{

public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException{

Integer sum = 0;

String line = value.toString();

StringTokenizer itr = new StringTokenizer(line);//分割

if (itr.hasMoreElements())

key = new LongWritable(Integer.parseInt(itr.nextToken()));  //获取第一个域的值

while(itr.hasMoreElements()){

sum += Integer.parseInt(itr.nextToken()); //求和剩下的值

}

context.write(key, new IntWritable(sum));

}

}

public static  void main(String args[])throws Exception{



Configuration conf = new Configuration();



Job job = new Job(conf, "MyMapre");



job.setJarByClass(MyMapre.class);



job.setMapOutputKeyClass(LongWritable.class);

job.setMapOutputValueClass(IntWritable.class);



job.setOutputKeyClass(LongWritable.class);

job.setOutputValueClass(IntWritable.class);



job.setMapperClass(wordcountMapper.class);



FileInputFormat.setInputPaths(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));



job.waitForCompletion(true);

}

}

1002 题目：

Sort

描述

你的程序需要读入输入数据文件，然后再将数据按升序排序后输出。在输入文件中，每一行都代表一个数据。

输入

输入是一组文本文件，在文本文件中每一行都是一个元数据，而且每个数据是用一个数字串代表待排序的数字。

输出

输出文件中每一行第一个数字是行标，后面一个数字是排好序的原始输入数据，注意排序顺序是从小到大升序排序。

输入样例

input1:

2

32

654

32

15

756

65223

input2:

5956

22

650

92

input3:

26

54

6

输出样例:

1 2

2 6

3 15

4 22

5 26

6 32

7 32

8 54

9 92

10 650

11 654

12 756

13 5956

14 65223

1002 解题思路：

在上一题已经说过在map阶段会对key自动排序，所以我们读入一行后（元数据），将其作为key，传递给reduce。我们可以看到最后输出的样例，还需要打印出行号。所以我们在reduce外面定义一个int 来记录总的行数（作为key输出）。而将map阶段传来的key作为reduce阶段的value输出。

上代码吧：

[java] view
plaincopy

public class MyMapre {

public static  class wordcountMapper extends

Mapper{

public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException{

String one = value.toString();

context.write(new LongWritable(Integer.parseInt(one)) , key);

}

}

public static  class wordcountReduce extends

Reducer{

int sum = 0;

public void reduce(LongWritable key, Iterablevalues, Context context)throws IOException, InterruptedException{

sum++;

context.write(new LongWritable(sum), key);

}

}

public static  void main(String args[])throws Exception{



Configuration conf = new Configuration();



Job job = new Job(conf, "Sort");



job.setJarByClass(MyMapre.class);



job.setOutputKeyClass(LongWritable.class);

job.setOutputValueClass(LongWritable.class);



job.setMapOutputKeyClass(LongWritable.class);

job.setMapOutputValueClass(LongWritable.class);



job.setMapperClass(wordcountMapper.class);

job.setReducerClass(wordcountReduce.class);



FileInputFormat.setInputPaths(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));



job.waitForCompletion(true);

}

}

1003 题目：
Data deduplication

描述

你的程序要求读入输入文件，在去掉所有数据中的重复数据后输出结果。在输入文件中每一行是一个元数据。

输入

输入是一组文本文件，在每个输入文件中每一行是一个数据。每一个元数据都是一个字符串。

输出文件

输出文件的每一行都是在输入文件中出现过的一个数据，并且输出文件中的每一行都不相同。

输入样例

input1:

2006-6-9 a

2006-6-10 b

2006-6-11 c

2006-6-12 d

2006-6-13 a

2006-6-14 b

2006-6-15 c

2006-6-11 c

input2:

2006-6-9 b

2006-6-10 a

2006-6-11 b

2006-6-12 d

2006-6-13 a

2006-6-14 c

2006-6-15 d

2006-6-11 c

输出样例:

2006-6-10 a

2006-6-10 b

2006-6-11 b

2006-6-11 c

2006-6-12 d

2006-6-13 a

2006-6-14 b

2006-6-14 c

2006-6-15 c

2006-6-15 d

2006-6-9 a

2006-6-9 b

注意:

1 输出结果是按照字典顺序排序的；

2 每一行都是一个元数据；

3 重复数据在输出文件中也要输出一次。
1003 解题思路：

首先还是将一行进行划分，将第一个域作为map阶段的key输出。第二个域作为map阶段的value输出。

reduce收到key-value对后，key相同时，会返回多个value。根据题意要求，value中出现的字母不能重复，所以我们要消掉重复的字母、而且最后需要排序，我们可以调用java自带的排序函数来实现。

上代码了：

[java] view
plaincopy

public class MyMapre {

public static  class wordcountMapper extends

Mapper{

public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException{

String line = value.toString();

Text word = new Text();

Text one = new Text();

StringTokenizer itr = new StringTokenizer(line);//划分

if (itr.hasMoreElements()) word.set(itr.nextToken());

if (itr.hasMoreElements()) one.set(itr.nextToken());

//获取两个域的值

context.write(word, one);

}

}

public static  class wordcountReduce extends

Reducer{



public void reduce(Text key, Iterablevalues, Context context)throws IOException, InterruptedException{

String pre = ""; //消除重复字母的变量

List list = new ArrayList(); //进行排序前存储的list

for (Text str : values){

if (!str.toString().equals(pre)) {  //如果不相等者更新pre变量

pre = str.toString();

list.add(pre);  //向list中添加不重复的元素

}

}

Collections.sort(list);  //排序

for (int i = 0; i < list.size(); i++)

context.write(key, new Text(list.get(i)));  //一次性输出

}

}





public static  void main(String args[])throws Exception{



Configuration conf = new Configuration();



Job job = new Job(conf, "deduplication");



job.setJarByClass(MyMapre.class);



job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);



job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(Text.class);



job.setMapperClass(wordcountMapper.class);

job.setReducerClass(wordcountReduce.class);



FileInputFormat.setInputPaths(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));



job.waitForCompletion(true);

}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航