hadoop安装与配置(完全分布模式)
2014-06-11 12:55
423 查看
-------------------------------------------------
一、前言
二、环境
三、配置
1.配置hosts文件及主机名
2.建立hadoop运行账号
3.配置ssh免密码连入
4.安装JDK
5.安装hadoop
6.配置hadoop
7.配置masters和slaves文件
8.向各节点复制hadoop
9.格式化namenode
10.启动hadoop
11.用jps检验各后台进程是否成功启动
四、测试
五、浏览器查看
-------------------------------------------------
一、前言
Hadoop是Apache软件基金会旗下的一个开源分布式计算平台。以Hadoop分布式文件系统(HDFS,Hadoop Distributed Filesystem)和MapReduce(Google MapReduce的开源实现)为核心的Hadoop为用户提供了系统底层细节透明的分布式基础架构。
对于Hadoop的集群来讲,可以分成两大类角色:Master和Salve。一个HDFS集群是由一个NameNode和若干个DataNode组成的。其中NameNode作为主服务器,管理文件系统的命名空间和客户端对文件系统的访问操作;集群中的DataNode管理存储的数据。MapReduce框架是由一个单独运行在主节点上的JobTracker和运行在每个集群从节点的TaskTracker共同组成的。主节点负责调度构成一个作业的所有任务,这些任务分布在不同的从节点上。主节点监控它们的执行情况,并且重新执行之前的失败任务;从节点仅负责由主节点指派的任务。当一个Job被提交时,JobTracker接收到提交作业和配置信息之后,就会将配置信息等分发给从节点,同时调度任务并监控TaskTracker的执行。
从上面的介绍可以看出,HDFS和MapReduce共同组成了Hadoop分布式系统体系结构的核心。HDFS在集群上实现分布式文件系统,MapReduce在集群上实现了分布式计算和任务处理。HDFS在MapReduce任务处理过程中提供了文件操作和存储等支持,MapReduce在HDFS的基础上实现了任务的分发、跟踪、执行等工作,并收集结果,二者相互作用,完成了Hadoop分布式集群的主要任务。
二、环境
1.系统版本:CentOS6.4 32位
JDK版本:jdk-7u45-linux-i586.rpm
Hadoop版本:hadoop-0.20.2.tar.gz
2.角色分类
192.168.2.101 namenode (充当namenode、secondary namenode和ResourceManager角色)
192.168.2.102 datanode1 (充当datanode、nodemanager角色)
192.168.2.103 datanode2 (充当datanode、nodemanager角色)
3.IP地址规划:
4.hadoop组件依赖关系:
5.名词解释
----HDFS-----
HDFS(Hadoop Distributed File System),hadoop分布式文件系统。
NameNode,HDFS命名服务器,负责与DataNode文件信息保存。
DataNode,HDFS数据节点,负责数据存储并汇报给NameNode。
SecondaryNamenode,NameNode的镜像备份节点。
-----Map Reduce------
TackTracker,启动和管理Map和Reduce子任务的节点。
JobTracker,hadoop的Map/Reduce调度器,负责与TackTracker通信分配计算任务并跟踪任务进度。
三、配置
1.配置hosts文件及主机名(每个节点都需要配置,以master主机为例)
3-1.安装openssh和rsync(每个节点都要安装)
简单测试(计算π值)
五、浏览器查看
1.通过用浏览器访问jobtracker所在节点的50030端口监控jobtracker
2.通过用浏览器访问namenode所在节点的50070端口监控集群
PS:错误解决方法:
错误1:
14/06/10 19:51:20 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /usr/hadoop/tmp/dfs/name/current
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1086)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1110)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:856)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:948)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
解决方案:
# chown -R hadoop:hadoop /usr/hadoop/tmp/ //修改数据存储路径的所属关系
错误2:利用jps检测进程时没有NameNode以及SecondaryNameNode
$ jps
29018 JobTracker
29091 Jps
解决方案:
# netstat -tupln //查看9000以及9001端口是否启动
错误3:
当再次执行bin/hadoop namenode -format时,master主机NameNode进程没有启动或者slave主机的DataNode进程无法启动,因为在每次执行bin/hadoop namenode -format时,会为namenode生成namespaceID, 但是在tmp文件夹下的datanode还是保留上次的namespaceID,在启动时,由于namespaceID不一致,导致datanode无法 启动。所以只要在每次bin/hadoop namenode -format之前先删除"临时文件夹"就可以启动成功。
解决方案:(所有节点都要删除)
“临时文件夹”是你的conf/core-site.xml文件里的hadoop.tmp.dir这个变量定义的。如果你没定义这个变量,那么就是默认值。即:haddoop/tmp文件夹。无论如何删掉这些主机上的tmp文件夹即可。(不会真正删除,你下次运行时又会生成,而这时候生成的namespaceID就是对的了)。
一、前言
二、环境
三、配置
1.配置hosts文件及主机名
2.建立hadoop运行账号
3.配置ssh免密码连入
4.安装JDK
5.安装hadoop
6.配置hadoop
7.配置masters和slaves文件
8.向各节点复制hadoop
9.格式化namenode
10.启动hadoop
11.用jps检验各后台进程是否成功启动
四、测试
五、浏览器查看
-------------------------------------------------
一、前言
Hadoop是Apache软件基金会旗下的一个开源分布式计算平台。以Hadoop分布式文件系统(HDFS,Hadoop Distributed Filesystem)和MapReduce(Google MapReduce的开源实现)为核心的Hadoop为用户提供了系统底层细节透明的分布式基础架构。
对于Hadoop的集群来讲,可以分成两大类角色:Master和Salve。一个HDFS集群是由一个NameNode和若干个DataNode组成的。其中NameNode作为主服务器,管理文件系统的命名空间和客户端对文件系统的访问操作;集群中的DataNode管理存储的数据。MapReduce框架是由一个单独运行在主节点上的JobTracker和运行在每个集群从节点的TaskTracker共同组成的。主节点负责调度构成一个作业的所有任务,这些任务分布在不同的从节点上。主节点监控它们的执行情况,并且重新执行之前的失败任务;从节点仅负责由主节点指派的任务。当一个Job被提交时,JobTracker接收到提交作业和配置信息之后,就会将配置信息等分发给从节点,同时调度任务并监控TaskTracker的执行。
从上面的介绍可以看出,HDFS和MapReduce共同组成了Hadoop分布式系统体系结构的核心。HDFS在集群上实现分布式文件系统,MapReduce在集群上实现了分布式计算和任务处理。HDFS在MapReduce任务处理过程中提供了文件操作和存储等支持,MapReduce在HDFS的基础上实现了任务的分发、跟踪、执行等工作,并收集结果,二者相互作用,完成了Hadoop分布式集群的主要任务。
二、环境
1.系统版本:CentOS6.4 32位
JDK版本:jdk-7u45-linux-i586.rpm
Hadoop版本:hadoop-0.20.2.tar.gz
2.角色分类
192.168.2.101 namenode (充当namenode、secondary namenode和ResourceManager角色)
192.168.2.102 datanode1 (充当datanode、nodemanager角色)
192.168.2.103 datanode2 (充当datanode、nodemanager角色)
3.IP地址规划:
4.hadoop组件依赖关系:
5.名词解释
----HDFS-----
HDFS(Hadoop Distributed File System),hadoop分布式文件系统。
NameNode,HDFS命名服务器,负责与DataNode文件信息保存。
DataNode,HDFS数据节点,负责数据存储并汇报给NameNode。
SecondaryNamenode,NameNode的镜像备份节点。
-----Map Reduce------
TackTracker,启动和管理Map和Reduce子任务的节点。
JobTracker,hadoop的Map/Reduce调度器,负责与TackTracker通信分配计算任务并跟踪任务进度。
三、配置
1.配置hosts文件及主机名(每个节点都需要配置,以master主机为例)
# vim /etc/hosts 192.168.2.101 master 192.168.2.102 slave1 192.168.2.103 slave2 # vim /etc/sysconfig/network //另外两个节点为slave1和slave2 HOSTNAME=master2.建立hadoop运行账号(每个节点都需要配置)
# useradd hadoop # passwd hadoop3.配置ssh免密码连入
3-1.安装openssh和rsync(每个节点都要安装)
# rpm -qa |grep openssh openssh-5.3p1-84.1.el6.i686 openssh-server-5.3p1-84.1.el6.i686 openssh-clients-5.3p1-84.1.el6.i686 openssh-askpass-5.3p1-84.1.el6.i686 # rpm -qa |grep rsync rsync-3.0.6-9.el6.i6863-2.更改ssh配置文档(每个节点都需要修改)
# vim /etc/ssh/sshd_config //将47-49行注释去掉即可 47 RSAAuthentication yes 48 PubkeyAuthentication yes 49 AuthorizedKeysFile .ssh/authorized_keys # service sshd restart Stopping sshd: [ OK ] Starting sshd: [ OK ]3-3.配置Master无密码登录所有Salve
# su - hadoop //切换到hadoop用户(在master主机上配置) $ ssh-keygen -t rsa //生成密码对 Generating public/private rsa key pair. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 回车,默认路径 Created directory '/home/hadoop/.ssh'. Enter passphrase (empty for no passphrase): 回车,无密码的密码对 Enter same passphrase again: 回车 Your identification has been saved in /home/hadoop/.ssh/id_rsa. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub. The key fingerprint is: 2e:5b:d0:e1:8e:8b:c9:14:81:e4:6b:7b:ef:20:0d:09 hadoop@master The key's randomart image is: +--[ RSA 2048]----+ | . | | o . | |E o . . | | . o . o . | | = . . S | | . + . = | | o = o + | | = = = | | +.= | +-----------------+ $ cd .ssh $ ll -rw-------. 1 hadoop hadoop 1679 Jun 10 18:39 id_rsa -rw-r--r--. 1 hadoop hadoop 395 Jun 10 18:39 id_rsa.pub $ cp id_rsa.pub authorized_keys $ ll -rw-r--r--. 1 hadoop hadoop 395 Jun 10 18:40 authorized_keys -rw-------. 1 hadoop hadoop 1679 Jun 10 18:39 id_rsa -rw-r--r--. 1 hadoop hadoop 395 Jun 10 18:39 id_rsa.pub $ chmod 600 authorized_keys //更改权限 $ ll -rw-------. 1 hadoop hadoop 395 Jun 10 18:40 authorized_keys -rw-------. 1 hadoop hadoop 1679 Jun 10 18:39 id_rsa -rw-r--r--. 1 hadoop hadoop 395 Jun 10 18:39 id_rsa.pub
$ ssh localhost //利用master主机自身验证 The authenticity of host 'localhost (::1)' can't be established. RSA key fingerprint is 85:a0:dd:ce:31:7a:c3:94:85:7c:9e:2d:20:f8:2d:2d. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
# su - hadoop //在slave主机上执行,所有slave主机都要配置,以slave1为例 $ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 回车,默认路径 Created directory '/home/hadoop/.ssh'. Enter passphrase (empty for no passphrase): 回车,无密码的密码对 Enter same passphrase again: 回车 Your identification has been saved in /home/hadoop/.ssh/id_rsa. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub. The key fingerprint is: b1:80:fb:b7:4a:e6:88:d4:48:02:04:de:c1:b4:2d:03 hadoop@slave2 The key's randomart image is: +--[ RSA 2048]----+ |oEoo | |o o.+. | |.. =... . | |. o. . o | |. . . S | | o o . | | o . + . | | . . = . . | | . . o.. | +-----------------+ $ scp ~/.ssh/authorized_keys slave1:/home/hadoop/.ssh/ //在master主机上执行,将master主机的公钥分发给所有slave主机 $ cd .ssh/ //回到slave1主机上查看 $ ll -rw-------. 1 hadoop hadoop 395 Jun 10 18:46 authorized_keys -rw-------. 1 hadoop hadoop 1671 Jun 10 18:45 id_rsa -rw-r--r--. 1 hadoop hadoop 395 Jun 10 18:45 id_rsa.pub $ chmod 600 authorized_keys
[hadoop@master ~]$ ssh slave1 //测试master无密码连接所有slave [hadoop@slave2 ~]$ hostname slave1 [hadoop@slave2 ~]$ exit logout Connection to slave1 closed. [hadoop@master ~]$ ssh slave2 [hadoop@slave2 ~]$ hostname slave2 [hadoop@slave2 ~]$ exit logout Connection to slave2 closed.3-4.配置所有Salve无密码登录Master
$ scp ~/.ssh/id_rsa.pub master:/home/hadoop/ //在所有slave主机上配置,以slave1主机为例 $ cat ~/id_rsa.pub >> ~/.ssh/authorized_keys //在master主机上执行,将所有slave主机上的ssh公钥追加到自身中 $ rm -f ~/id_rsa.pub
[hadoop@slave1 .ssh]$ ssh master //测试所有的slave主机登录master主机 Last login: Tue Jun 10 18:42:00 2014 from localhost.localdomain [hadoop@master ~]$ hostname master [hadoop@master ~]$ exit logout Connection to master closed. [hadoop@slave2 .ssh]$ ssh master Last login: Tue Jun 10 18:43:18 2014 from slave1 [hadoop@master ~]$ hostname master [hadoop@master ~]$ exit logout Connection to master closed.4.安装JDK(每个节点都要安装,使用root用户)
# rpm -ivh jdk-7u45-linux-i586.rpm # rpm -ql jdk |less /usr/java/jdk1.7.0_45 //JDK安装路径,非常重要 # vim /etc/profile //写入搜索路径中 54 JAVA_HOME=/usr/java/jdk1.7.0_45 55 PATH=$PATH:$JAVA_HOME/bin 56 export PATH USER LOGNAME MAIL HISTSIZE HISTCONTROL JAVA_HOME # . /etc/profile # java -version java version "1.7.0_45" Java(TM) SE Runtime Environment (build 1.7.0_45-b18) Java HotSpot(TM) Client VM (build 24.45-b08, mixed mode, sharing)5.安装hadoop(先在master主机中安装并配置,然后再拷贝到所有slave主机中)
# tar -zxvf hadoop-0.20.2.tar.gz -C /usr //以root用户执行 # cd /usr/ # mv hadoop-0.20.2/ hadoop # ll drwxr-xr-x. 12 hadoop hadoop 4096 Feb 19 2010 hadoop # mkdir /usr/hadoop/tmp # chown -R hadoop:hadoop /usr/hadoop/tmp # vim /etc/profile //此步骤所有节点都需要配置(仿照JAVA_HOME即可) 54 JAVA_HOME=/usr/java/jdk1.7.0_45 55 HADOOP_HOME=/usr/hadoop 56 PATH=$PATH:$JAVA_HOME/bin 57 PATH=$PATH:$HADOOP_HOME/bin 58 export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL JAVA_HOME HADOOP_HOME # . /etc/profile6.配置hadoop(先在master主机中配置,然后再拷贝到所有slave主机中)
# cd /usr/hadoop/conf/ //使用root用户操作 # vim hadoop-env.sh 9 export JAVA_HOME=/usr/java/jdk1.7.0_45 //修改JDK路径core-site.xml和hdfs-site.xml是站在HDFS角度上配置文件;core-site.xml和mapred-site.xml是站在MapReduce角度上配置文件。
# vim core-site.xml <configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/hadoop/tmp</value> //数据存储路径 </property> <property> <name>fs.default.name</name> <value>hdfs://192.168.2.101:9000</value> //HDFS地址和端口 </property> </configuration>
# vim hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> //数据保存的份数,默认3份 </property> </configuration>
# vim mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>http://192.168.2.101:9001</value> //JobTracker的地址和端口 </property> </configuration>7.配置masters和slaves文件(依然在master主机中配置,在生产环境中应架设DNS服务器,使用域名,本实验使用的是hosts文件,但是为了防止hosts文件出现错误,使用IP更加可靠)
# vim masters 192.168.2.101 # vim slaves 192.168.2.102 192.168.2.1038.master主机向各节点slave复制hadoop
# scp -r /usr/hadoop/ slave1:/usr/ # scp -r /usr/hadoop/ slave2:/usr/
# chown -R hadoop:hadoop /usr/hadoop/ //修改所有slave主机hadoop的所属关系9.格式化namenode
[root@master ~]# su - hadoop [hadoop@master ~]$ hadoop namenode -format //如无法执行,使用source /etc/profile更新 14/06/10 19:53:43 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = master/192.168.2.101 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 ************************************************************/ 14/06/10 19:53:43 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop 14/06/10 19:53:43 INFO namenode.FSNamesystem: supergroup=supergroup 14/06/10 19:53:43 INFO namenode.FSNamesystem: isPermissionEnabled=true 14/06/10 19:53:44 INFO common.Storage: Image file of size 96 saved in 0 seconds. 14/06/10 19:53:44 INFO common.Storage: Storage directory /usr/hadoop/tmp/dfs/name has been successfully formatted. 14/06/10 19:53:44 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at master/192.168.2.101 ************************************************************/10.启动hadoop
[hadoop@master bin]$ start-all.sh starting namenode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-namenode-master.out 192.168.2.102: starting datanode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-datanode-slave1.out 192.168.2.103: starting datanode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-datanode-slave2.out 192.168.2.101: starting secondarynamenode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-master.out starting jobtracker, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-jobtracker-master.out 192.168.2.103: starting tasktracker, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-tasktracker-slave2.out 192.168.2.102: starting tasktracker, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-tasktracker-slave1.out11.用jps检验各后台进程是否成功启动
[hadoop@master ~]$ jps 30758 Jps 28827 NameNode 29018 JobTracker 28954 SecondaryNameNode
[hadoop@salve1 ~]$ jps 27508 TaskTracker 29409 Jps 27436 DataNode
[hadoop@salve2 ~]$ jps 27508 TaskTracker 29523 Jps 27437 DataNode四、测试
简单测试(计算π值)
[hadoop@master ~]$ cd /usr/hadoop/ [hadoop@master hadoop]$ hadoop jar hadoop-0.20.2-examples.jar pi 10 100 Number of Maps = 10 Samples per Map = 100 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 14/06/14 05:02:13 INFO mapred.FileInputFormat: Total input paths to process : 10 14/06/14 05:02:13 INFO mapred.JobClient: Running job: job_201406132259_0004 14/06/14 05:02:14 INFO mapred.JobClient: map 0% reduce 0% 14/06/14 05:02:28 INFO mapred.JobClient: map 20% reduce 0% 14/06/14 05:02:31 INFO mapred.JobClient: map 40% reduce 0% 14/06/14 05:02:37 INFO mapred.JobClient: map 80% reduce 0% 14/06/14 05:02:40 INFO mapred.JobClient: map 80% reduce 26% 14/06/14 05:02:43 INFO mapred.JobClient: map 100% reduce 26% 14/06/14 05:02:55 INFO mapred.JobClient: map 100% reduce 100% 14/06/14 05:02:57 INFO mapred.JobClient: Job complete: job_201406132259_0004 14/06/14 05:02:57 INFO mapred.JobClient: Counters: 19 14/06/14 05:02:57 INFO mapred.JobClient: Job Counters 14/06/14 05:02:57 INFO mapred.JobClient: Launched reduce tasks=1 14/06/14 05:02:57 INFO mapred.JobClient: Rack-local map tasks=1 14/06/14 05:02:57 INFO mapred.JobClient: Launched map tasks=10 14/06/14 05:02:57 INFO mapred.JobClient: Data-local map tasks=9 14/06/14 05:02:57 INFO mapred.JobClient: FileSystemCounters 14/06/14 05:02:57 INFO mapred.JobClient: FILE_BYTES_READ=226 14/06/14 05:02:57 INFO mapred.JobClient: HDFS_BYTES_READ=1180 14/06/14 05:02:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=826 14/06/14 05:02:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215 14/06/14 05:02:57 INFO mapred.JobClient: Map-Reduce Framework 14/06/14 05:02:57 INFO mapred.JobClient: Reduce input groups=20 14/06/14 05:02:57 INFO mapred.JobClient: Combine output records=0 14/06/14 05:02:57 INFO mapred.JobClient: Map input records=10 14/06/14 05:02:57 INFO mapred.JobClient: Reduce shuffle bytes=280 14/06/14 05:02:57 INFO mapred.JobClient: Reduce output records=0 14/06/14 05:02:57 INFO mapred.JobClient: Spilled Records=40 14/06/14 05:02:57 INFO mapred.JobClient: Map output bytes=180 14/06/14 05:02:57 INFO mapred.JobClient: Map input bytes=240 14/06/14 05:02:57 INFO mapred.JobClient: Combine input records=0 14/06/14 05:02:57 INFO mapred.JobClient: Map output records=20 14/06/14 05:02:57 INFO mapred.JobClient: Reduce input records=20 Job Finished in 45.455 seconds Estimated value of Pi is 3.148000000000000000002.上传本地数据文件文件测试(单词统计,wordcount)
[hadoop@master ~]$ mkdir input [hadoop@master ~]$ echo "hello word">input/test1.txt [hadoop@master ~]$ echo "hello hadoop">input/test2.txt
[hadoop@master ~]$ cd /usr/hadoop/ [hadoop@master hadoop]$ hadoop dfs -put ~/input test [hadoop@master hadoop]$ hadoop dfs -ls test/* -rw-r--r-- 1 hadoop supergroup 11 2014-06-10 20:37 /user/hadoop/test/test1.txt -rw-r--r-- 1 hadoop supergroup 13 2014-06-10 20:37 /user/hadoop/test/test2.txt
[hadoop@master hadoop]$ hadoop jar hadoop-0.20.2-examples.jar wordcount test out 14/06/10 20:40:25 INFO input.FileInputFormat: Total input paths to process : 2 14/06/10 20:40:26 INFO mapred.JobClient: Running job: job_201406102021_0001 14/06/10 20:40:27 INFO mapred.JobClient: map 0% reduce 0% 14/06/10 20:40:39 INFO mapred.JobClient: map 50% reduce 0% 14/06/10 20:40:45 INFO mapred.JobClient: map 100% reduce 0% 14/06/10 20:40:51 INFO mapred.JobClient: map 100% reduce 100% 14/06/10 20:40:53 INFO mapred.JobClient: Job complete: job_201406102021_0001 14/06/10 20:40:53 INFO mapred.JobClient: Counters: 18 14/06/10 20:40:53 INFO mapred.JobClient: Job Counters 14/06/10 20:40:53 INFO mapred.JobClient: Launched reduce tasks=1 14/06/10 20:40:53 INFO mapred.JobClient: Rack-local map tasks=1 14/06/10 20:40:53 INFO mapred.JobClient: Launched map tasks=2 14/06/10 20:40:53 INFO mapred.JobClient: Data-local map tasks=1 14/06/10 20:40:53 INFO mapred.JobClient: FileSystemCounters 14/06/10 20:40:53 INFO mapred.JobClient: FILE_BYTES_READ=54 14/06/10 20:40:53 INFO mapred.JobClient: HDFS_BYTES_READ=24 14/06/10 20:40:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=178 14/06/10 20:40:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=24 14/06/10 20:40:53 INFO mapred.JobClient: Map-Reduce Framework 14/06/10 20:40:53 INFO mapred.JobClient: Reduce input groups=3 14/06/10 20:40:53 INFO mapred.JobClient: Combine output records=4 14/06/10 20:40:53 INFO mapred.JobClient: Map input records=2 14/06/10 20:40:53 INFO mapred.JobClient: Reduce shuffle bytes=60 14/06/10 20:40:53 INFO mapred.JobClient: Reduce output records=3 14/06/10 20:40:53 INFO mapred.JobClient: Spilled Records=8 14/06/10 20:40:53 INFO mapred.JobClient: Map output bytes=40 14/06/10 20:40:53 INFO mapred.JobClient: Combine input records=4 14/06/10 20:40:53 INFO mapred.JobClient: Map output records=4 14/06/10 20:40:53 INFO mapred.JobClient: Reduce input records=4
[hadoop@master hadoop]$ hadoop dfs -ls drwxr-xr-x - hadoop supergroup 0 2014-06-10 20:40 /user/hadoop/out drwxr-xr-x - hadoop supergroup 0 2014-06-10 20:37 /user/hadoop/test [hadoop@master hadoop]$ hadoop dfs -ls ./out drwxr-xr-x - hadoop supergroup 0 2014-06-10 20:40 /user/hadoop/out/_logs -rw-r--r-- 1 hadoop supergroup 24 2014-06-10 20:40 /user/hadoop/out/part-r-00000 [hadoop@master hadoop]$ hadoop dfs -cat ./out/* hadoop 1 hello 2 word 1 cat: Source must be a file.总结:经测试结果显示hello出现2次,hadoop出现1次,word出现1次。
五、浏览器查看
1.通过用浏览器访问jobtracker所在节点的50030端口监控jobtracker
2.通过用浏览器访问namenode所在节点的50070端口监控集群
PS:错误解决方法:
错误1:
14/06/10 19:51:20 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /usr/hadoop/tmp/dfs/name/current
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1086)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1110)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:856)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:948)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
解决方案:
# chown -R hadoop:hadoop /usr/hadoop/tmp/ //修改数据存储路径的所属关系
错误2:利用jps检测进程时没有NameNode以及SecondaryNameNode
$ jps
29018 JobTracker
29091 Jps
解决方案:
# netstat -tupln //查看9000以及9001端口是否启动
错误3:
当再次执行bin/hadoop namenode -format时,master主机NameNode进程没有启动或者slave主机的DataNode进程无法启动,因为在每次执行bin/hadoop namenode -format时,会为namenode生成namespaceID, 但是在tmp文件夹下的datanode还是保留上次的namespaceID,在启动时,由于namespaceID不一致,导致datanode无法 启动。所以只要在每次bin/hadoop namenode -format之前先删除"临时文件夹"就可以启动成功。
解决方案:(所有节点都要删除)
“临时文件夹”是你的conf/core-site.xml文件里的hadoop.tmp.dir这个变量定义的。如果你没定义这个变量,那么就是默认值。即:haddoop/tmp文件夹。无论如何删掉这些主机上的tmp文件夹即可。(不会真正删除,你下次运行时又会生成,而这时候生成的namespaceID就是对的了)。
相关文章推荐
- 完全分布模式hadoop集群安装配置之二 添加新节点组成分布式集群
- hadoop0.20.2完全分布模式安装和配置
- HBase入门笔记(三)-- 完全分布模式Hadoop集群安装配置
- 完全分布模式hadoop集群安装配置之二 添加新节点组成分布式集群
- HBase入门笔记(三)-- 完全分布模式Hadoop集群安装配置
- 完全分布模式hadoop集群安装配置之一安装第一个节点
- 安装并配置Hadoop伪分布模式
- Hadoop完全分布式模式的安装和配置
- hadoop学习笔记之--完全分布模式安装
- Hadoop完全分布式模式的安装和配置
- CentOS安装配置Hadoop 1.2.1(伪分布模式)
- hadoop完全分布式模式的安装和配置
- Centos6.4 +Hadoop 1.2.1集群完全分布模式配置
- Hadoop1.2.1完全分布模式安装教程
- Hadoop完全分布安装配置
- Hadoop 2.5.1在Ubuntu 14.04安装和配置(伪分布模式)
- Linux下安装Hadoop(完全分布模式)
- Ubuntu hadoop-2.5.2 单机,伪分布,完全模式安装
- 完全分布模式安装Hadoop
- Hadoop完全分布模式配置详解