您的位置:首页 > 编程语言 > Java开发

Hadoop学习记录(7)|Eclipse远程调试Hadoop

2014-03-11 16:17 621 查看
1、创建Hadoop项目









2、创建包、类

这里使用hdfs.WordCount为例





3、编写自定Mapper和Reducer程序

MyMapper类

static class MyMapper extends            Mapper<LongWritable, Text, Text, LongWritable> {        @Override        protected void map(LongWritable k1, Text v1, Context context)                throws IOException, InterruptedException {            // 对内容进行分词处理存到字符数组内            StringTokenizer tokenizer = new StringTokenizer(v1.toString());            // 创建Text k2            Text k2 = new Text();            // 遍历写入context中            while (tokenizer.hasMoreTokens()) {                k2.set(tokenizer.nextToken());                context.write(k2, new LongWritable(1));            }        }    }

Reducer类

static class MyReducer extends            Reducer<Text, LongWritable, Text, LongWritable> {        @Override        protected void reduce(Text k2, Iterable<LongWritable> v2s,                Context context) throws IOException, InterruptedException {            long sum = 0;            for(LongWritable val : v2s){                sum += val.get();            }            context.write(k2, new LongWritable(sum));        }    }


编写main驱动方法

public static void main(String[] args) throws Exception {                if(args.length != 2){            System.err.print("Usage:wordcount");            System.exit(2);        }                Configuration conf = new Configuration();        Job job = new Job(conf,WordCount.class.getSimpleName());        //用eclipse插件运行相当于是jar包运行        job.setJarByClass(WordCount.class);        //设置mapper        job.setMapperClass(MyMapper.class);        //设置map输出k2的类型        job.setMapOutputKeyClass(Text.class);        //设置map输出v2的类型        job.setMapOutputValueClass(LongWritable.class);        //设置分区类        job.setPartitionerClass(HashPartitioner.class);        //设置作业数量        job.setNumReduceTasks(1);        //设置reducer类        job.setReducerClass(MyReducer.class);        //设置输出的格式        job.setOutputFormatClass(TextOutputFormat.class);        //设置k3的输出类型        job.setOutputKeyClass(Text.class);        //设置v3的输出类型        job.setOutputValueClass(LongWritable.class);                //这里是从外面传入参数        FileInputFormat.setInputPaths(job, new Path(args[0]));        FileOutputFormat.setOutputPath(job, new Path(args[1]));        //提交任务,如果返回false代表有异常,使用system.exit结束java虚拟机,如果没问题返回0正常执行.        System.exit(job.waitForCompletion(true)?0:1);                    }

4、运行mapreduce程序远程调用hadoop。

先配置访问路径





写hdfs访问路径。





现在使用Run as—Run on hadoop会出现一个错误

14/03/11 15:58:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable14/03/11 15:58:22 ERROR security.UserGroupInformation: PriviledgedActionException as:Sky cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Sky\mapred\staging\Sky1823204560\.staging to 0700Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Sky\mapred\staging\Sky1823204560\.staging to 0700    at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)    at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)    at java.security.AccessController.doPrivileged(Native Method)    at javax.security.auth.Subject.doAs(Subject.java:415)    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)    at hdfs.WordCount.main(WordCount.java:58)

这个是windows下的权限问题,在linux上运行时正常的。

解决方法:

打开F:\Software\Hadoop\hadoop-1.1.2\src\core\org\apache\hadoop\fs\FileUtil.java

注释checkReturnValue函数中的内容,保存即可!






再运行时正常输出计算器,并生成了新的目录。输出目录不能存在,由hadoop自动创建完成!

14/03/11 16:08:40 INFO mapred.JobClient: map 100% reduce 100%
14/03/11 16:08:41 INFO mapred.JobClient: Job complete: job_local_0001
14/03/11 16:08:41 INFO mapred.JobClient: Counters: 19
14/03/11 16:08:41 INFO mapred.JobClient: File Output Format Counters
14/03/11 16:08:41 INFO mapred.JobClient: Bytes Written=2154020
14/03/11 16:08:41 INFO mapred.JobClient: FileSystemCounters
14/03/11 16:08:41 INFO mapred.JobClient: FILE_BYTES_READ=631320575
14/03/11 16:08:41 INFO mapred.JobClient: HDFS_BYTES_READ=141910490
14/03/11 16:08:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=774430506
14/03/11 16:08:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2154020
14/03/11 16:08:41 INFO mapred.JobClient: File Input Format Counters
14/03/11 16:08:41 INFO mapred.JobClient: Bytes Read=70955245
14/03/11 16:08:41 INFO mapred.JobClient: Map-Reduce Framework
14/03/11 16:08:41 INFO mapred.JobClient: Reduce input groups=59150
14/03/11 16:08:41 INFO mapred.JobClient: Map output materialized bytes=142981973
14/03/11 16:08:41 INFO mapred.JobClient: Combine output records=0
14/03/11 16:08:41 INFO mapred.JobClient: Map input records=255015
14/03/11 16:08:41 INFO mapred.JobClient: Reduce shuffle bytes=0
14/03/11 16:08:41 INFO mapred.JobClient: Reduce output records=59150
14/03/11 16:08:41 INFO mapred.JobClient: Spilled Records=26709860
14/03/11 16:08:41 INFO mapred.JobClient: Map output bytes=128572984
14/03/11 16:08:41 INFO mapred.JobClient: Total committed heap usage (bytes)=305004544
14/03/11 16:08:41 INFO mapred.JobClient: Combine input records=0
14/03/11 16:08:41 INFO mapred.JobClient: Map output records=7201751
14/03/11 16:08:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=99
14/03/11 16:08:41 INFO mapred.JobClient: Reduce input records=7201751
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: