Hadoop学习记录(7)|Eclipse远程调试Hadoop
2014-03-11 16:17
621 查看
1、创建Hadoop项目
2、创建包、类
这里使用hdfs.WordCount为例
3、编写自定Mapper和Reducer程序
MyMapper类
Reducer类
编写main驱动方法
4、运行mapreduce程序远程调用hadoop。
先配置访问路径
写hdfs访问路径。
现在使用Run as—Run on hadoop会出现一个错误
这个是windows下的权限问题,在linux上运行时正常的。
解决方法:
再运行时正常输出计算器,并生成了新的目录。输出目录不能存在,由hadoop自动创建完成!
14/03/11 16:08:40 INFO mapred.JobClient: map 100% reduce 100%
14/03/11 16:08:41 INFO mapred.JobClient: Job complete: job_local_0001
14/03/11 16:08:41 INFO mapred.JobClient: Counters: 19
14/03/11 16:08:41 INFO mapred.JobClient: File Output Format Counters
14/03/11 16:08:41 INFO mapred.JobClient: Bytes Written=2154020
14/03/11 16:08:41 INFO mapred.JobClient: FileSystemCounters
14/03/11 16:08:41 INFO mapred.JobClient: FILE_BYTES_READ=631320575
14/03/11 16:08:41 INFO mapred.JobClient: HDFS_BYTES_READ=141910490
14/03/11 16:08:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=774430506
14/03/11 16:08:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2154020
14/03/11 16:08:41 INFO mapred.JobClient: File Input Format Counters
14/03/11 16:08:41 INFO mapred.JobClient: Bytes Read=70955245
14/03/11 16:08:41 INFO mapred.JobClient: Map-Reduce Framework
14/03/11 16:08:41 INFO mapred.JobClient: Reduce input groups=59150
14/03/11 16:08:41 INFO mapred.JobClient: Map output materialized bytes=142981973
14/03/11 16:08:41 INFO mapred.JobClient: Combine output records=0
14/03/11 16:08:41 INFO mapred.JobClient: Map input records=255015
14/03/11 16:08:41 INFO mapred.JobClient: Reduce shuffle bytes=0
14/03/11 16:08:41 INFO mapred.JobClient: Reduce output records=59150
14/03/11 16:08:41 INFO mapred.JobClient: Spilled Records=26709860
14/03/11 16:08:41 INFO mapred.JobClient: Map output bytes=128572984
14/03/11 16:08:41 INFO mapred.JobClient: Total committed heap usage (bytes)=305004544
14/03/11 16:08:41 INFO mapred.JobClient: Combine input records=0
14/03/11 16:08:41 INFO mapred.JobClient: Map output records=7201751
14/03/11 16:08:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=99
14/03/11 16:08:41 INFO mapred.JobClient: Reduce input records=7201751
2、创建包、类
这里使用hdfs.WordCount为例
3、编写自定Mapper和Reducer程序
MyMapper类
static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable> { @Override protected void map(LongWritable k1, Text v1, Context context) throws IOException, InterruptedException { // 对内容进行分词处理存到字符数组内 StringTokenizer tokenizer = new StringTokenizer(v1.toString()); // 创建Text k2 Text k2 = new Text(); // 遍历写入context中 while (tokenizer.hasMoreTokens()) { k2.set(tokenizer.nextToken()); context.write(k2, new LongWritable(1)); } } }
Reducer类
static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable> { @Override protected void reduce(Text k2, Iterable<LongWritable> v2s, Context context) throws IOException, InterruptedException { long sum = 0; for(LongWritable val : v2s){ sum += val.get(); } context.write(k2, new LongWritable(sum)); } }
编写main驱动方法
public static void main(String[] args) throws Exception { if(args.length != 2){ System.err.print("Usage:wordcount"); System.exit(2); } Configuration conf = new Configuration(); Job job = new Job(conf,WordCount.class.getSimpleName()); //用eclipse插件运行相当于是jar包运行 job.setJarByClass(WordCount.class); //设置mapper job.setMapperClass(MyMapper.class); //设置map输出k2的类型 job.setMapOutputKeyClass(Text.class); //设置map输出v2的类型 job.setMapOutputValueClass(LongWritable.class); //设置分区类 job.setPartitionerClass(HashPartitioner.class); //设置作业数量 job.setNumReduceTasks(1); //设置reducer类 job.setReducerClass(MyReducer.class); //设置输出的格式 job.setOutputFormatClass(TextOutputFormat.class); //设置k3的输出类型 job.setOutputKeyClass(Text.class); //设置v3的输出类型 job.setOutputValueClass(LongWritable.class); //这里是从外面传入参数 FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); //提交任务,如果返回false代表有异常,使用system.exit结束java虚拟机,如果没问题返回0正常执行. System.exit(job.waitForCompletion(true)?0:1); }
4、运行mapreduce程序远程调用hadoop。
先配置访问路径
写hdfs访问路径。
现在使用Run as—Run on hadoop会出现一个错误
14/03/11 15:58:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable14/03/11 15:58:22 ERROR security.UserGroupInformation: PriviledgedActionException as:Sky cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Sky\mapred\staging\Sky1823204560\.staging to 0700Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Sky\mapred\staging\Sky1823204560\.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912) at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at hdfs.WordCount.main(WordCount.java:58)
这个是windows下的权限问题,在linux上运行时正常的。
解决方法:
打开F:\Software\Hadoop\hadoop-1.1.2\src\core\org\apache\hadoop\fs\FileUtil.java
注释checkReturnValue函数中的内容,保存即可!
注释checkReturnValue函数中的内容,保存即可!
再运行时正常输出计算器,并生成了新的目录。输出目录不能存在,由hadoop自动创建完成!
14/03/11 16:08:40 INFO mapred.JobClient: map 100% reduce 100%
14/03/11 16:08:41 INFO mapred.JobClient: Job complete: job_local_0001
14/03/11 16:08:41 INFO mapred.JobClient: Counters: 19
14/03/11 16:08:41 INFO mapred.JobClient: File Output Format Counters
14/03/11 16:08:41 INFO mapred.JobClient: Bytes Written=2154020
14/03/11 16:08:41 INFO mapred.JobClient: FileSystemCounters
14/03/11 16:08:41 INFO mapred.JobClient: FILE_BYTES_READ=631320575
14/03/11 16:08:41 INFO mapred.JobClient: HDFS_BYTES_READ=141910490
14/03/11 16:08:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=774430506
14/03/11 16:08:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2154020
14/03/11 16:08:41 INFO mapred.JobClient: File Input Format Counters
14/03/11 16:08:41 INFO mapred.JobClient: Bytes Read=70955245
14/03/11 16:08:41 INFO mapred.JobClient: Map-Reduce Framework
14/03/11 16:08:41 INFO mapred.JobClient: Reduce input groups=59150
14/03/11 16:08:41 INFO mapred.JobClient: Map output materialized bytes=142981973
14/03/11 16:08:41 INFO mapred.JobClient: Combine output records=0
14/03/11 16:08:41 INFO mapred.JobClient: Map input records=255015
14/03/11 16:08:41 INFO mapred.JobClient: Reduce shuffle bytes=0
14/03/11 16:08:41 INFO mapred.JobClient: Reduce output records=59150
14/03/11 16:08:41 INFO mapred.JobClient: Spilled Records=26709860
14/03/11 16:08:41 INFO mapred.JobClient: Map output bytes=128572984
14/03/11 16:08:41 INFO mapred.JobClient: Total committed heap usage (bytes)=305004544
14/03/11 16:08:41 INFO mapred.JobClient: Combine input records=0
14/03/11 16:08:41 INFO mapred.JobClient: Map output records=7201751
14/03/11 16:08:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=99
14/03/11 16:08:41 INFO mapred.JobClient: Reduce input records=7201751
相关文章推荐
- Hadoop学习之配置Eclipse远程调试Hadoop
- Hadoop学习笔记之在Eclipse中远程调试Hadoop
- Hadoop学习之配置Eclipse远程调试Hadoop
- Hadoop学习笔记之在Eclipse中远程调试Hadoop
- hadoop学习(六)--------eclipse远程调试
- windows下在eclipse上远程连接hadoop集群调试mapreduce错误记录
- Hadoop学习笔记之在Eclipse中远程调试Hadoop
- Hadoop学习笔记之在Eclipse中远程调试Hadoop
- 本地eclipse连接远程hadoop集群运行wordcount实例,实现远程调试
- Hadoop学习笔记 6 - eclipse远程连接Hadoop
- hadoop eclipse 权威指南天气代码远程调试
- eclipse中集成hadoop插件以及远程调试hadoop的resourcemanager
- eclipse远程调试搭载在Linux上的Hadoop的步骤
- 【hadoop】Hadoop学习笔记(九):如何在windows上使用eclipse远程连接hadoop进行程序开发
- 在eclipse下远程调试hadoop2.0
- eclipse/intellij idea 远程调试hadoop 2.6.0
- Hadoop--学习笔记 在Eclipse中操作远程hdfs文件
- Hadoop学习全程记录——eclipse hadoop开发环境配置(2)(修改)
- eclipse远程调试hadoop2.6
- 使用Windows上Eclipse远程调试Linux上的Hadoop