Hadoop安装(Debian 8)
2016-07-25 00:00
225 查看
环境信息
三台虚拟机节点(192.168.100.171<debian171>, 192.168.100.172<debian172>)
Debian jessie 8.5
Hadoop 2.7.2
注:单机和伪分布式环境搭建可以参考hadoop官网上的说明,在此仅为分布式环境搭建
分布式环境搭建
#安装ssh和rsync
#配置ssh免密登录
在171和172的机器上都执行上面的脚本,然后使用scp复制对方的id_dsa.pub
debian171:
debian172:
尝试ssh:
注意:有些教程会直接复制authorized_keys到其他节点,有可能会报警告类似:“Warning: Permanently added 'debian171,192.168.100.171' (ECDSA) to the list of known hosts.”,遇到这种情况,可以删除.ssh下的authorized_keys和known_hosts文件,重新cat一遍,并尝试ssh后,第二次ssh时就不会有这种警告了。
#安装hadoop
配置hadoop-env.sh,修改export JAVA_HOME=${JAVA_HOME}
配置core-site.xml,添加如下参数
注意:这里强烈建议配置hadoop.tmp.dir,该参数默认在/tmp目录,如果linux重启,会丢失namenode的信息,需要重新format
配置hdfs-site.xml
注:dfs.replication默认为3,这里暂时用1;dfs.permissions.enabled默认为true,由于之后需要搭建window开发环境(window用户名与linux不同),故暂时改为false,生产环境不建议关闭
配置mapred-site.xml
注:这里使用yarn管理
配置yarn-site.xml
配置slaves
注意:确保所有机器上/etc/hosts已添加debian171和debian172的域名
复制debian171下etc/hadoop下的文件到debian172的hadoop对应目录
#格式化HDFS
#OK,启动hadoop
#启动yarn
#启动jobhistoryserver
#查看集群状态
注意Live datanodes的值
#通过http://debian171:8088/cluster/nodes 查看集群
#测试集群
查看结果
参考资料
http://my.oschina.net/u/2338162/blog/610683 http://www.powerxing.com/install-hadoop-cluster/ http://blog.chinaunix.net/uid-28379399-id-4555364.html
三台虚拟机节点(192.168.100.171<debian171>, 192.168.100.172<debian172>)
Debian jessie 8.5
Hadoop 2.7.2
注:单机和伪分布式环境搭建可以参考hadoop官网上的说明,在此仅为分布式环境搭建
分布式环境搭建
#安装ssh和rsync
sudo apt-get install ssh rsync
#配置ssh免密登录
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
在171和172的机器上都执行上面的脚本,然后使用scp复制对方的id_dsa.pub
debian171:
scp surfin@192.168.100.172:/home/surfin/.ssh/id_dsa.pub ~/.ssh/id_dsa_172.pub cat ~/.ssh/id_dsa_172.pub >> ~/.ssh/authorized_keys
debian172:
scp surfin@192.168.100.171:/home/surfin/.ssh/id_dsa.pub ~/.ssh/id_dsa_171.pub cat ~/.ssh/id_dsa_171.pub >> ~/.ssh/authorized_keys
尝试ssh:
surfin@debian171:~/.ssh$ ssh debian171 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Mon Jul 25 10:17:29 2016 from debian171.surfin.org surfin@debian171:~$ exit
注意:有些教程会直接复制authorized_keys到其他节点,有可能会报警告类似:“Warning: Permanently added 'debian171,192.168.100.171' (ECDSA) to the list of known hosts.”,遇到这种情况,可以删除.ssh下的authorized_keys和known_hosts文件,重新cat一遍,并尝试ssh后,第二次ssh时就不会有这种警告了。
#安装hadoop
sudo mkidr -p /usr/local/hadoop sudo chown -R surfin:surfin /usr/local/hadoop tar xvf hadoop-2.7.2.tar.gz -C /usr/local/hadoop
配置hadoop-env.sh,修改export JAVA_HOME=${JAVA_HOME}
# The java implementation to use. export JAVA_HOME=/usr/local/java/jdk1.8.0_92
配置core-site.xml,添加如下参数
<property> <name>fs.defaultFS</name> <value>hdfs://debian171:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/hadoop-2.7.2/tmp</value> <description>Abase for other temporary directories.</description> </property>
注意:这里强烈建议配置hadoop.tmp.dir,该参数默认在/tmp目录,如果linux重启,会丢失namenode的信息,需要重新format
配置hdfs-site.xml
<property> <name>dfs.namenode.secondary.http-address</name> <value>debian171:50090</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/hadoop-2.7.2/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/hadoop-2.7.2/tmp/dfs/data</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property>
注:dfs.replication默认为3,这里暂时用1;dfs.permissions.enabled默认为true,由于之后需要搭建window开发环境(window用户名与linux不同),故暂时改为false,生产环境不建议关闭
配置mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>debian171:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>debian171:19888</value> </property>
注:这里使用yarn管理
配置yarn-site.xml
<property> <name>yarn.resourcemanager.hostname</name> <value>debian171</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
配置slaves
debian172
注意:确保所有机器上/etc/hosts已添加debian171和debian172的域名
复制debian171下etc/hadoop下的文件到debian172的hadoop对应目录
#格式化HDFS
bin/hdfs namenode -format
#OK,启动hadoop
sbin/start-dfs.sh
#启动yarn
sbin/start-yarn.sh
#启动jobhistoryserver
sbin/mr-jobhistory-daemon.sh start historyserver
#查看集群状态
bin/hdfs dfsadmin -report Configured Capacity: 121511374848 (113.17 GB) Present Capacity: 111463813120 (103.81 GB) DFS Remaining: 111463260160 (103.81 GB) DFS Used: 552960 (540 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (1): Name: 192.168.100.172:50010 (debian172.surfin.org) Hostname: debian172.surfin.org Decommission Status : Normal Configured Capacity: 121511374848 (113.17 GB) DFS Used: 552960 (540 KB) Non DFS Used: 10047561728 (9.36 GB) DFS Remaining: 111463260160 (103.81 GB) DFS Used%: 0.00% DFS Remaining%: 91.73% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Mon Jul 25 11:04:23 HKT 2016
注意Live datanodes的值
#通过http://debian171:8088/cluster/nodes 查看集群
#测试集群
bin/hdfs dfs -mkdir input bin/hdfs dfs -put etc/hadoop input bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar grep input output 'dfs[a-z.]+'
查看结果
bin/hdfs dfs -cat output/*
参考资料
http://my.oschina.net/u/2338162/blog/610683 http://www.powerxing.com/install-hadoop-cluster/ http://blog.chinaunix.net/uid-28379399-id-4555364.html
相关文章推荐
- 详解HDFS Short Circuit Local Reads
- Hadoop_2.1.0 MapReduce序列图
- 使用Hadoop搭建现代电信企业架构
- 单机版搭建Hadoop环境图文教程详解
- hadoop常见错误以及处理方法详解
- hadoop 单机安装配置教程
- hadoop的hdfs文件操作实现上传文件到hdfs
- hadoop实现grep示例分享
- Apache Hadoop版本详解
- linux下搭建hadoop环境步骤分享
- hadoop client与datanode的通信协议分析
- hadoop中一些常用的命令介绍
- Hadoop单机版和全分布式(集群)安装
- 用PHP和Shell写Hadoop的MapReduce程序
- hadoop map-reduce中的文件并发操作
- Hadoop1.2中配置伪分布式的实例
- hadoop上传文件功能实例代码
- java结合HADOOP集群文件上传下载
- Hadoop 2.x伪分布式环境搭建详细步骤
- Java访问Hadoop分布式文件系统HDFS的配置说明