大数据平台之MapReduce
1…在集群节点中/usr/hdp/2.4.3.0-227/hadoop-mapreduce/目录下,存在一个案例JAR 包hadoop-mapreduce-examples.jar。运行JAR包中的PI程序来进行计算圆周率π的近似值,要求运行5次Map任务,每个Map任务的投掷次数为5,运行完成后输出结果为。
[root@master ~]# hadoop jar/usr/hdp/2.4.3.0-227/hadoop-mapreduce/hadoop-mapreduce- examples-2.7.1.2.4.3.0-227.jarpi 5 5 WARNING: Use "yarn jar" to launch YARNapplications. Number of Maps = 5 Samples per Map = 5 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Starting Job 17/05/07 03:25:16 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/ 17/05/07 03:25:16 INFO client.RMProxy: Connecting toResourceManager at slaver1/10.0.0.15:8050 17/05/07 03:25:17 INFO input.FileInputFormat: Totalinput paths to process : 5 17/05/07 03:25:17 INFO mapreduce.JobSubmitter: numberof splits:5 17/05/07 03:25:18 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494125392913_0001 17/05/07 03:25:19 INFO impl.YarnClientImpl: Submittedapplication application_1494125392913_0001 17/05/07 03:25:19 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494125392913_0001/ 17/05/07 03:25:19 INFO mapreduce.Job: Running job:job_1494125392913_0001 17/05/07 03:25:30 INFO mapreduce.Job: Jobjob_1494125392913_0001 running in uber mode : false 17/05/07 03:25:30 INFO mapreduce.Job: map 0% reduce 0% 17/05/07 03:25:36 INFO mapreduce.Job: map 40% reduce 0% 17/05/07 03:25:41 INFO mapreduce.Job: map 60% reduce 0% 17/05/07 03:25:42 INFO mapreduce.Job: map 80% reduce 0% 17/05/07 03:25:45 INFO mapreduce.Job: map 100% reduce 0% 17/05/07 03:25:48 INFO mapreduce.Job: map 100% reduce 100% 17/05/07 03:25:49 INFO mapreduce.Job: Jobjob_1494125392913_0001 completed successfully 17/05/07 03:25:49 INFO mapreduce.Job: Counters: 49 FileSystem Counters FILE: Number of bytes read=116 FILE: Number of bytes written=819237 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1300 HDFS: Number of bytes written=215 HDFS: Number of read operations=23 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 JobCounters Launched map tasks=5 Launched reduce tasks=1 Data-local map tasks=5 Total time spent by all maps in occupied slots (ms)=50808 Total time spent by all reduces in occupied slots (ms)=10839 Total time spent by all map tasks (ms)=16936 Total time spent by all reduce tasks (ms)=3613 Total vcore-seconds taken by all maptasks=16936 Total vcore-seconds taken by all reduce tasks=3613 Total megabyte-seconds taken by all map tasks=26013696 Total megabyte-seconds taken by all reduce tasks=5549568 Map-Reduce Framework Map input records=5 Map output records=10 Map output bytes=90 Map output materialized bytes=140 Input split bytes=710 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=140 Reduce input records=10 Reduce output records=0 Spilled Records=20 Shuffled Maps =5 Failed Shuffles=0 Merged Map outputs=5 GC time elapsed (ms)=450 CPU time spent (ms)=4330 Physical memory (bytes) snapshot=5840977920 Virtual memory (bytes) snapshot=19436744704 Total committed heap usage (bytes)=5483528192 ShuffleErrors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 FileInput Format Counters Bytes Read=590 FileOutput Format Counters Bytes Written=97 Job Finished in 32.805 seconds Estimated value of Pi is 3.680000000000000000
2…在集群节点中/usr/hdp/2.4.3.0-227/hadoop-mapreduce/目录下,存在一个案例JAR 包hadoop-mapreduce-examples.jar。运 行JAR包中的wordcount程序来对/1daoyun/file/BigDataSkills.txt文件进行单词计数,将运算结果输出到/1daoyun/output目录中,使用相关命令查询单词计数结果,输出结果为。
[root@master ~]# hadoop jar/usr/hdp/2.4.3.0-227/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.1.2.4.3.0- 227.jarwordcount /1daoyun/file/BigDataSkills.txt /1daoyun/output WARNING: Use "yarn jar" to launch YARNapplications. 17/05/07 03:28:10 INFO impl.TimelineClientImpl: Timelineservice address: http://slaver1:8188/ws/v1/timeline/ 17/05/07 03:28:10 INFO client.RMProxy: Connecting toResourceManager at slaver1/10.0.0.15:8050 17/05/07 03:28:11 INFO input.FileInputFormat: Totalinput paths to process : 1 17/05/07 03:28:12 INFO mapreduce.JobSubmitter: numberof splits:1 17/05/07 03:28:13 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494125392913_0003 17/05/07 03:28:14 INFO impl.YarnClientImpl: Submittedapplication application_1494125392913_0003 17/05/07 03:28:14 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494125392913_0003/ 17/05/07 03:28:14 INFO mapreduce.Job: Running job:job_1494125392913_0003 17/05/07 03:28:24 INFO mapreduce.Job: Jobjob_1494125392913_0003 running in uber mode : false 17/05/07 03:28:24 INFO mapreduce.Job: map 0% reduce 0% 17/05/07 03:28:30 INFO mapreduce.Job: map 100% reduce 0% 17/05/07 03:28:40 INFO mapreduce.Job: map 100% reduce 100% 17/05/07 03:30:51 INFO mapred.ClientServiceDelegate:Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server 17/05/07 03:30:52 INFO mapreduce.Job: Jobjob_1494125392913_0003 completed successfully 17/05/07 03:30:52 INFO mapreduce.Job: Counters: 49 FileSystem Counters FILE: Number of bytes read=90 FILE: Number of bytes written=272541 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=163 HDFS: Number of bytes written=60 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=8748 Total time spent by all reduces in occupied slots (ms)=12042 Total time spent by all map tasks (ms)=2916 Total time spent by all reduce tasks (ms)=4014 Total vcore-seconds taken by all map tasks=2916 Total vcore-seconds taken by all reduce tasks=4014 Total megabyte-seconds taken by all map tasks=4478976 Total megabyte-seconds taken by all reduce tasks=6165504 Map-Reduce Framework Map input records=3 Map output records=6 Map output bytes=72 Map output materialized bytes=90 Input split bytes=114 Combine input records=6 Combine output records=6 Reduce input groups=6 Reduce shuffle bytes=90 Reduce input records=6 Reduce output records=6 Spilled Records=12 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=109 CPU time spent (ms)=1870 Physical memory (bytes) snapshot=1347346432 Virtual memory (bytes) snapshot=6500098048 Total committed heap usage (bytes)=1229455360 ShuffleErrors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 FileInput Format Counters BytesRead=49 FileOutput Format Counters Bytes Written=60 [root@master ~]# hadoop fs -cat/1daoyun/output/part-r-00000 "duiya 1 hello 1 nisibusisha 1 wosha" 1 zsh 1
3.在集群节点中/usr/hdp/2.4.3.0-227/hadoop-mapreduce/目录下,存在一个案例JAR 包hadoop-mapreduce-examples.jar。运行JAR包中的sudoku程序来计算下表中数独运算题的结果。运行完成后输出结果为。
[root@master ~]# cat puzzle1.dta 8 ? ? ? ? ? ? ? ? ? ? 3 6 ? ? ? ? ? ? 7 ? ? 9 ? 2 ? ? ? 5 ? ? ? 7 ? ? ? ? ? ? ? 4 5 7 ? ? ? ? ? 1 ? ? ? 3 ? ? ? 1 ? ? ? ? 6 8 ? ? 8 5 ? ? ? 1 ? ? 9 ? ? ? ? 4 ? ? [root@master hadoop-mapreduce]# hadoop jarhadoop-mapreduce-examples-2.7.1.2.4.3.0-227.jar sudoku /root/puzzle1.dta WARNING: Use "yarn jar" to launch YARNapplications. Solving /root/puzzle1.dta 8 1 2 7 5 3 6 4 9 9 4 3 6 8 2 1 7 5 6 7 5 4 9 1 2 8 3 1 5 4 2 3 7 8 9 6 3 6 9 8 4 5 7 2 1 2 8 7 1 6 9 5 3 4 5 2 1 9 7 4 3 6 8 4 3 8 5 2 6 9 1 7 7 9 6 3 1 8 4 5 2 Found 1 solutions
4.在集群节点中/usr/hdp/2.4.3.0-227/hadoop-mapreduce/目录下,存在一个案例JAR 包hadoop-mapreduce-examples.jar。运行JAR包中的grep程序来统计文件系统中 /1daoyun/file/BigDataSkills.txt文件中“Hadoop”出现的次数,统计完成后,查询统计结果信息,输出结果为?
[root@master hadoop-mapreduce]# hadoop jarhadoop-mapreduce-examples-2.7.1.2.4.3.0-227.jar grep/1daoyun/file/BigDataSkills.txt /output hadoop WARNING: Use "yarn jar" to launch YARNapplications. 17/05/07 13:37:06 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/ 17/05/07 13:37:06 INFO client.RMProxy: Connecting toResourceManager at slaver1/10.0.0.15:8050 17/05/07 13:37:07 INFO input.FileInputFormat: Totalinput paths to process : 1 17/05/07 13:37:07 INFO mapreduce.JobSubmitter: numberof splits:1 17/05/07 13:37:07 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494163309183_0003 17/05/07 13:37:07 INFO impl.YarnClientImpl: Submittedapplication application_1494163309183_0003 17/05/07 13:37:07 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494163309183_0003/ 17/05/07 13:37:07 INFO mapreduce.Job: Running job:job_1494163309183_0003 17/05/07 13:37:14 INFO mapreduce.Job: Jobjob_1494163309183_0003 running in uber mode : false 17/05/07 13:37:14 INFO mapreduce.Job: map 0% reduce 0% 17/05/07 13:37:23 INFO mapreduce.Job: Task Id :attempt_1494163309183_0003_m_000000_0, Status : FAILED Exception from container-launch. Container id:container_e08_1494163309183_0003_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: atorg.apache.hadoop.util.Shell.runCommand(Shell.java:600) atorg.apache.hadoop.util.Shell.run(Shell.java:511) atorg.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:783) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) atjava.util.concurrent.FutureTask.run(FutureTask.java:266) atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) atjava.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/05/07 13:37:29 INFO mapreduce.Job: map 100% reduce 0% 17/05/07 13:37:36 INFO mapreduce.Job: map 100% reduce 100% 17/05/07 13:37:37 INFO mapreduce.Job: Jobjob_1494163309183_0003 completed successfully 17/05/07 13:37:37 INFO mapreduce.Job: Counters: 51 FileSystem Counters FILE: Number of bytes read=23 FILE: Number of bytes written=273125 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: N 1b023 umber of write operations=0 HDFS: Number of bytes read=146 HDFS: Number of bytes written=109 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 JobCounters Failed map tasks=1 Launched map tasks=2 Launched reduce tasks=1 Other local map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=33174 Total time spent by all reduces in occupied slots (ms)=9663 Total time spent by all map tasks (ms)=11058 Total time spent by all reduce tasks (ms)=3221 Total vcore-seconds taken by all map tasks=11058 Total vcore-seconds taken by all reduce tasks=3221 Total megabyte-seconds taken by all map tasks=16985088 Total megabyte-seconds taken by all reduce tasks=4947456 Map-Reduce Framework Map input records=5 Map output records=2 Map output bytes=30 Map output materialized bytes=23 Input split bytes=114 Combine input records=2 Combine output records=1 Reduce input groups=1 Reduce shuffle bytes=23 Reduce input records=1 Reduce output records=1 Spilled Records=2 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=107 CPU time spent (ms)=2070 Physical memory (bytes) snapshot=1351417856 Virtual memory (bytes) snapshot=6499807232 Total committed heap usage (bytes)=1233649664 ShuffleErrors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 FileInput Format Counters Bytes Read=32 FileOutput Format Counters Bytes Written=109 17/05/07 13:37:37 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/ 17/05/07 13:37:37 INFO client.RMProxy: Connecting toResourceManager at slaver1/10.0.0.15:8050 17/05/07 13:37:37 INFO input.FileInputFormat: Totalinput paths to process : 1 17/05/07 13:37:37 INFO mapreduce.JobSubmitter: numberof splits:1 17/05/07 13:37:38 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494163309183_0004 17/05/07 13:37:38 INFO impl.YarnClientImpl: Submittedapplication application_1494163309183_0004 17/05/07 13:37:38 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494163309183_0004/ 17/05/07 13:37:38 INFO mapreduce.Job: Running job:job_1494163309183_0004 17/05/07 13:37:48 INFO mapreduce.Job: Jobjob_1494163309183_0004 running in uber mode : false 17/05/07 13:37:48 INFO mapreduce.Job: map 0% reduce 0% 17/05/07 13:38:02 INFO mapreduce.Job: Task Id :attempt_1494163309183_0004_m_000000_0, Status : FAILED Exception from container-launch. Container id:container_e08_1494163309183_0004_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: atorg.apache.hadoop.util.Shell.runCommand(Shell.java:600) atorg.apache.hadoop.util.Shell.run(Shell.java:511) atorg.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:783) atorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303) atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) atjava.util.concurrent.FutureTask.run(FutureTask.java:266) atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) atjava.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/05/07 13:39:58 INFO mapreduce.Job: Task Id :attempt_1494163309183_0004_m_000000_1, Status : FAILED Container exited with a non-zero exit code 154 17/05/07 13:40:48 INFO mapreduce.Job: map 100% reduce 0% 17/05/07 13:40:55 INFO mapreduce.Job: map 100% reduce 100% 17/05/07 13:40:56 INFO mapreduce.Job: Jobjob_1494163309183_0004 completed successfully 17/05/07 13:40:56 INFO mapreduce.Job: Counters: 49 FileSystem Counters FILE: Number of bytes read=23 FILE: Number of bytes written=272059 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=236 HDFS: Number of bytes written=9 HDFS: Number of read operations=7 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 JobCounters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=8769 Total time spent by all reduces in occupied slots (ms)=12570 Total time spent by all map tasks (ms)=2923 Total time spent by all reduce tasks (ms)=4190 Total vcore-seconds taken by all maptasks=2923 Total vcore-seconds taken by all reduce tasks=4190 Total megabyte-seconds taken by all map tasks=4489728 Total megabyte-seconds taken by all reduce tasks=6435840 Map-Reduce Framework Map input records=1 Map output records=1 Map output bytes=15 Map output materialized bytes=23 Input split bytes=127 Combine input records=0 Combine output records=0 Reduce input groups=1 Reduce shuffle bytes=23 Reduce input records=1 Reduce output records=1 Spilled Records=2 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=122 CPU time spent (ms)=1840 Physical memory (bytes) snapshot=1351442432 Virtual memory (bytes) snapshot=6500921344 Total committed heap usage (bytes)=1233125376 ShuffleErrors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 FileInput Format Counters Bytes Read=109 FileOutput Format Counters Bytes Written=9 [root@master hadoop-mapreduce]# hadoop fs -cat/output/part-r-00000 2 hadoop
- 大数据采集、清洗、处理:使用MapReduce进行离线数据分析完整案例
- 大数据分析平台工程师和算法工程师岗位职责:
- 大数据平台hbase,phoenix,spark搭建和研发问题和解决方式汇总
- 大数据-Hadoop生态(16)-MapReduce框架原理-自定义FileInputFormat
- 基于大数据开发套件定时调度带资源文件的MapReduce作业
- 基于大数据分析的安全管理平台技术研究及应用
- 秒级处理海量数据,浙江移动大数据平台是怎么做到的
- 产品经理学大数据——大数据软件框架:Hadoop框架(2)之MapReduce(分布式计算框架)
- 接触的一些大数据平台及产品
- 用Hadoop和MapReduce进行大数据分析
- 四个策略“快又准”打造企业大数据分析平台
- 大快政务大数据分析平台架构与特点介绍
- 大数据平台HDP-2.6.4安装与配置
- 大数据学习09:MapReduce基础
- 电商用户行为分析大数据平台相关系列5-KAFKA安装
- 大数据应用之双色球算奖平台总体设计历史数据存储篇
- 100亿小数据实时计算平台(大数据系列目录)
- 大数据平台架构(数据处理)
- 大数据应用技术实验报告三 MapReduce分布式编程
- Java开发2.0:用Hadoop MapReduce进行大数据分析