您的位置:首页 > 运维架构

Hadoop安装(Debian 8)

2016-07-25 00:00 225 查看
环境信息

三台虚拟机节点(192.168.100.171<debian171>, 192.168.100.172<debian172>)

Debian jessie 8.5

Hadoop 2.7.2

注:单机和伪分布式环境搭建可以参考hadoop官网上的说明,在此仅为分布式环境搭建

分布式环境搭建

#安装ssh和rsync

sudo apt-get install ssh rsync

#配置ssh免密登录

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

在171和172的机器上都执行上面的脚本,然后使用scp复制对方的id_dsa.pub

debian171:

scp surfin@192.168.100.172:/home/surfin/.ssh/id_dsa.pub ~/.ssh/id_dsa_172.pub
cat ~/.ssh/id_dsa_172.pub >> ~/.ssh/authorized_keys

debian172:

scp surfin@192.168.100.171:/home/surfin/.ssh/id_dsa.pub ~/.ssh/id_dsa_171.pub
cat ~/.ssh/id_dsa_171.pub >> ~/.ssh/authorized_keys

尝试ssh:

surfin@debian171:~/.ssh$ ssh debian171

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Jul 25 10:17:29 2016 from debian171.surfin.org
surfin@debian171:~$ exit

注意:有些教程会直接复制authorized_keys到其他节点,有可能会报警告类似:“Warning: Permanently added 'debian171,192.168.100.171' (ECDSA) to the list of known hosts.”,遇到这种情况,可以删除.ssh下的authorized_keys和known_hosts文件,重新cat一遍,并尝试ssh后,第二次ssh时就不会有这种警告了。

#安装hadoop

sudo mkidr -p /usr/local/hadoop
sudo chown -R surfin:surfin /usr/local/hadoop
tar xvf hadoop-2.7.2.tar.gz -C /usr/local/hadoop

配置hadoop-env.sh,修改export JAVA_HOME=${JAVA_HOME}

# The java implementation to use.
export JAVA_HOME=/usr/local/java/jdk1.8.0_92

配置core-site.xml,添加如下参数

<property>
<name>fs.defaultFS</name>
<value>hdfs://debian171:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/hadoop-2.7.2/tmp</value>
<description>Abase for other temporary directories.</description>
</property>

注意:这里强烈建议配置hadoop.tmp.dir,该参数默认在/tmp目录,如果linux重启,会丢失namenode的信息,需要重新format

配置hdfs-site.xml

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>debian171:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hadoop-2.7.2/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/hadoop-2.7.2/tmp/dfs/data</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>

注:dfs.replication默认为3,这里暂时用1;dfs.permissions.enabled默认为true,由于之后需要搭建window开发环境(window用户名与linux不同),故暂时改为false,生产环境不建议关闭

配置mapred-site.xml

cp mapred-site.xml.template mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>debian171:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>debian171:19888</value>
</property>

注:这里使用yarn管理

配置yarn-site.xml

<property>
<name>yarn.resourcemanager.hostname</name>
<value>debian171</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

配置slaves

debian172

注意:确保所有机器上/etc/hosts已添加debian171和debian172的域名

复制debian171下etc/hadoop下的文件到debian172的hadoop对应目录

#格式化HDFS

bin/hdfs namenode -format

#OK,启动hadoop

sbin/start-dfs.sh

#启动yarn

sbin/start-yarn.sh

#启动jobhistoryserver

sbin/mr-jobhistory-daemon.sh start historyserver

#查看集群状态

bin/hdfs dfsadmin -report
Configured Capacity: 121511374848 (113.17 GB)
Present Capacity: 111463813120 (103.81 GB)
DFS Remaining: 111463260160 (103.81 GB)
DFS Used: 552960 (540 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (1):

Name: 192.168.100.172:50010 (debian172.surfin.org)
Hostname: debian172.surfin.org
Decommission Status : Normal
Configured Capacity: 121511374848 (113.17 GB)
DFS Used: 552960 (540 KB)
Non DFS Used: 10047561728 (9.36 GB)
DFS Remaining: 111463260160 (103.81 GB)
DFS Used%: 0.00%
DFS Remaining%: 91.73%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Jul 25 11:04:23 HKT 2016

注意Live datanodes的值

#通过http://debian171:8088/cluster/nodes 查看集群



#测试集群

bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop input
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar grep input output 'dfs[a-z.]+'

查看结果

bin/hdfs dfs -cat output/*

参考资料
http://my.oschina.net/u/2338162/blog/610683 http://www.powerxing.com/install-hadoop-cluster/ http://blog.chinaunix.net/uid-28379399-id-4555364.html
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  hadoop