您的位置:首页 > 运维架构 > Linux

centos6.5 搭建hadoop平台

2016-08-26 16:13 369 查看
1、下载hadoop-2.3.0-cdh5.0.0.tar.gz

地址:http://archive.cloudera.com/cdh5/cdh/5/

2、useradd hadoop

3、passwd hadoop

……

4、cp hadoop-2.3.0-cdh5.0.0.tar.gz /home/hadoop

5、su hadoop

6、各节点间建立ssh(可参考http://blog.csdn.net/qq_30831935/article/details/52311726)

7、解压并修改配置文件

tar -zxvf hadoop-2.3.0-cdh5.0.0.tar.gz

cd hadoop-2.3.0-cdh5.0.0/etc/hadoop

vim core-site.xml

<configuration>

<property>
<name>fs.defaultFS</name>
<value>hdfs://m1:9000</value>
</property>

<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/tmp</value>
<description>Abasefor other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.spark.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.spark.groups</name>
<value>*</value>
</property>

</configuration>


vim slaves(节点的主机名)

m1
m2
m3


vim hdfs-site.xml

<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>m1:9001</value>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/tmp/dfs/name</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/tmp/dfs/data</value>
</property>

<property>
<name>dfs.replication</name>
<value>3</value>
</property>

<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>


vim mapred-site.xml(没有就 cp mapred-site.xml.template mapred-site.xml)

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>m1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>m1:19888</value>
</property>
</configuration>


vim yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>m1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>m1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>m1:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>m1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>m1:8088</value>
</property>

</configuration>


8、配置环境变量(3台)

export HADOOP_HOME=/home/hadoop/hadoop-2.3.0-cdh5.0.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin


文件夹拷贝至节点上

scp -r hadoop-2.3.0-cdh5.0.0 hadoop@m2:~

scp -r hadoop-2.3.0-cdh5.0.0 hadoop@m3:~

9、hadoop namenode -format

10、start-all.sh

如果出现java_home not found

解决办法:vim hadoop-env.sh 将java路径写全

Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_20


11、stop-all.sh

12、运行自带案例

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0-cdh5.0.0.jar pi 2 2

第一个参数2指用2个map来计算

第二个参数2指每个map执行2次

设的越大,计算时间越长,结果也更精准

13、hdfs的操作

hadoop fs -ls / 查看hdfs里的文件夹

hadoop fs -put 文件 /文件夹

hadoop fs -mkdir /文件夹…

hadoop fs -text /文件 查看文件

14、浏览器打开8088端口



浏览器打开50070端口



15、hadoop计算

hadoop jar **.jar 完整类名 文件地址 结果存到哪

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  centos hadoop ssh