您的位置:首页 > 运维架构

Hadoop伪分布模式(HDFS)

2018-03-24 02:39 225 查看
http://hadoop.apache.org/docs/r2.8.2/    官网学习

部署方式:
1.单机模式standalone   1个java进程,用来做debug的 下载即可使用   (一般忽略)
2.伪分布模式Pseudo-Distributed Mode  开发|学习  多个java进程    (重点了解)
3.集群模式Cluster Mode   :生产 多台机器多个java进程 http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/SingleCluster.html
伪分布式部署: HDFS
    1.创建hadoop服务的一个用户
        [root@rzdatahadoop002 software]# useradd hadoop
        [root@rzdatahadoop002 software]# id hadoop
        uid=501(hadoop) gid=501(hadoop) groups=501(hadoop)
        [root@rzdatahadoop002 software]#         [root@rzdatahadoop002 software]# vi /etc/sudoers

        hadoop  ALL=(root)      NOPASSWD:ALL        给hadoop用户授权
    2.部署JAVA

        Oracle jdk1.8(Open JDK尽量不要使用)   部署完后要用which java看权限对不对
    3.部署ssh服务     配置信任关系    配置用户的ssh的信任关系
        service sshd status 查看ssh状态

        部署自己密钥:ssh-keygen
            id_rsa 私钥   id_rsa.pub 公钥
            cat id_rsa.pub > authorized_keys    生成关系

        [root@rzdatahadoop002 ~]# service sshd status
        openssh-daemon (pid  1386) is running...

        [root@rzdatahadoop002 ~]# 
            Q : SSH执行命令的时候,多了个yes or no ,是哪个目录下的哪个文件?

            A : /root/.ssh/known_hostspwd

    4.解压hadoop
          ln -s /opt/software/hadoop-2.8.1 hadoop    设置软连接

            [root@rzdatahadoop002 software]# tar -xzvf hadoop-2.8.1.tar.gz
            chown -R hadoop:hadoop 文件夹 -->文件夹和文件夹的里面的
            chown -R hadoop:hadoop 软连接文件夹 --> 只修改软连接文件夹,不会修改文件夹里面的
            chown -R hadoop:hadoop 软连接文件夹/* --> 软连接文件夹不修改,只会修改文件夹里面的
            chown -R hadoop:hadoop hadoop-2.8.1 --> 修改原文件夹

            补充: 
                某个服务数据目录在A盘(500G),还剩10G。/a/dfs/data
                添加B盘2T。
                1.A盘:mv /a/dfs /b/
                2.B盘:ln -s /b/dfs /a
                3.检查(修改)A,B盘的文件夹的用户和用户组的权限
            理解:实际上就是把A盘的文件移到B盘,再通过设置软连接,这样/a就能访问B盘的内容
    

            [root@rzdatahadoop002 software]# cd hadoop
            [root@rzdatahadoop002 hadoop]# rm -f *.txt
            [root@rzdatahadoop002 hadoop]# ll
            total 28
            drwxr-xr-x. 2 hadoop hadoop 4096 Dec 10 11:54 bin
            drwxr-xr-x. 3 hadoop hadoop 4096 Dec 10 11:54 etc
            drwxr-xr-x. 2 hadoop hadoop 4096 Dec 10 11:54 include
            drwxr-xr-x. 3 hadoop hadoop 4096 Dec 10 11:54 lib
            drwxr-xr-x. 2 hadoop hadoop 4096 Dec 10 11:54 libexec
            drwxr-xr-x. 2 hadoop hadoop 4096 Dec 10 11:54 sbin
            drwxr-xr-x. 3 hadoop hadoop 4096 Dec 10 11:54 share

                bin: 命令
                etc:配置文件
                sbin: 用来启动关闭hadoop进程
                hadoop-env.sh : hadoop配置环境
                core-site.xml : hadoop 核心配置文件
                hdfs-site.xml : hdfs服务的 --> 会起进程
                [mapred-site.xml : mapred计算所需要的配置文件] 只当在jar计算时才有,在yarn上执行
                yarn-site.xml : yarn服务的 --> 会起进程

                slaves: 集群的机器名称             

6.配置hadoop用户的ssh的信任关系
[hadoop@rzdatahadoop002 ~]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
5b:07:ff:e5:82:85:f3:41:32:f3:80:05:c9:57:0f:e9 hadoop@rzdatahadoop002
The key's randomart image is:
+--[ RSA 2048]----+
|         ..o..o. |
|          oo. .o |
|          o.=.. .|
|           o OE  |
|        S . = + .|
|         o . * + |
|        .   . + .|
|               . |
|                 |
+-----------------+
[hadoop@rzdatahadoop002 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@rzdatahadoop002 ~]$ chmod 0600 ~/.ssh/authorized_keys

7.格式化
[hadoop@rzdatahadoop002 hadoop]$ bin/hdfs namenode -format
17/12/13 22:22:04 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
17/12/13 22:22:04 INFO namenode.FSImageFormatProtobuf: Sav ing image file /tmp/hadoop-hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
17/12/13 22:22:04 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds.
17/12/13 22:22:04 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
17/12/13 22:22:04 INFO util.ExitUtil: Exiting with status 0
17/12/13 22:22:04 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at rzdatahadoop002/192.168.137.201
************************************************************/

Storage directory: /tmp/hadoop-hadoop/dfs/name  不能存放tmp文件夹 会清掉
1.默认的存储路径哪个配置?
2.hadoop-hadoop指的什么意思?
core-site.xml
hadoop.tmp.dir:  /tmp/hadoop-${user.name}
hdfs-site.xml
dfs.namenode.name.dir : file://${hadoop.tmp.dir}/dfs/name

8.启动HDFS服务
[hadoop@rzdatahadoop002 sbin]$ ./start-dfs.sh
Starting namenodes on [localhost]
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 9a:ea:f5:06:bf:de:ca:82:66:51:81:fe:bf:8a:62:36.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
localhost: Error: JAVA_HOME is not set and could not be found.
localhost: Error: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is 9a:ea:f5:06:bf:de:ca:82:66:51:81:fe:bf:8a:62:36.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (RSA) to the list of known hosts.
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
[hadoop@rzdatahadoop002 sbin]$ ps -ef|grep hadoop
root     11292 11085  0 21:59 pts/1    00:00:00 su - hadoop
hadoop   11293 11292  0 21:59 pts/1    00:00:00 -bash
hadoop   11822 11293  0 22:34 pts/1    00:00:00 ps -ef
hadoop   11823 11293  0 22:34 pts/1    00:00:00 grep hadoop
[hadoop@rzdatahadoop002 sbin]$ echo $JAVA_HOME
/usr/java/jdk1.8.0_45
发现JAVA_HOME变量是存在的,无法启动HDFS服务

[hadoop@rzdatahadoop002 sbin]$ vi ../etc/hadoop/hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_45

[hadoop@rzdatahadoop002 sbin]$ ./start-dfs.sh 
Starting namenodes on [localhost]
localhost: starting namenode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-namenode-rzdatahadoop002.out
localhost: starting datanode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-datanode-rzdatahadoop002.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-secondarynamenode-rzdatahadoop002.out

namenode(名称节点) : localhost  
datanode(数据节点) : localhost
secondary namenode(第二名称节点): 0.0.0.0
http://localhost:50070/ 默认的端口:50070

web: localhost:9000

9.使用命令(hadoop、hdfs)
[hadoop@rzdatahadoop002 bin]$ ./hdfs dfs -mkdir /user
[hadoop@rzdatahadoop002 bin]$ ./hdfs dfs -mkdir /user/hadoop

[hadoop@rzdatahadoop002 bin]$ echo "123456" > rz.log
[hadoop@rzdatahadoop002 bin]$ ./hadoop fs -put rz.log hdfs://localhost:9000/
[hadoop@rzdatahadoop002 bin]$ 
[hadoop@rzdatahadoop002 bin]$ ./hadoop fs -ls hdfs://localhost:9000/
Found 2 items
-rw-r--r--   1 hadoop supergroup          7 2017-12-13 22:56 hdfs://localhost:9000/rz.log
drwxr-xr-x   - hadoop super
a9d8
group          0 2017-12-13 22:55 hdfs://localhost:9000/user

[hadoop@rzdatahadoop002 bin]$ ./hadoop fs -ls /
Found 2 items
-rw-r--r--   1 hadoop supergroup          7 2017-12-13 22:56 hdfs://localhost:9000/rz.log
drwxr-xr-x   - hadoop supergroup          0 2017-12-13 22:55 hdfs://localhost:9000/user

10.想要修改hdfs://localhost:9000为hdfs://192.168.137.201:9000
[hadoop@rzdatahadoop002 bin]$ ../sbin/stop-dfs.sh 

[hadoop@rzdatahadoop002 bin]$ vi ../etc/hadoop/core-site.xml 
<configuration>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
            <name>fs.defaultFS</name>
            <value>hdfs://192.168.137.201:9000</value>
    </property>
</configuration>

[hadoop@rzdatahadoop002 bin]$ ./hdfs namenode -format
[hadoop@rzdatahadoop002 bin]$ ../sbin/start-dfs.sh 
Starting namenodes on [rzdatahadoop002]
rzdatahadoop002: starting namenode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-namenode-rzdatahadoop002.out
localhost: starting datanode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-datanode-rzdatahadoop002.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-secondarynamenode-rzdatahadoop002.out

[hadoop@rzdatahadoop002 bin]$ netstat -nlp|grep 9000
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.137.201:9000        0.0.0.0:*                   LISTEN      14974/java          
[hadoop@rzdatahadoop002 bin]$ 

11.修改HDFS的服务以rzdatahadoop002启动
namenode: rzdatahadoop002
datanode: localhost  
secondarynamenode: 0.0.0.0 

针对于datanode修改:
[hadoop@rzdatahadoop002 hadoop]$ vi slaves
rzdatahadoop002

针对于secondarynamenode修改:
[hadoop@rzdatahadoop002 hadoop]$ vi hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
            <name>dfs.replication</name>
            <value>1</value>
    </property>

        <property>
                 <name>dfs.namenode.secondary.http-address</name>
                 <value>rzdatahadoop002:50090</value>
        </property>
        <property>
                 <name>dfs.namenode.secondary.https-address</name>
                 <value>rzdatahadoop002:50091</value>
        </property>

"hdfs-site.xml" 35L, 1173C written 

[hadoop@rzdatahadoop002 hadoop]$ cd ../../sbin

[hadoop@rzdatahadoop002 sbin]$ ./stop-dfs.sh
[hadoop@rzdatahadoop002 sbin]$ ./start-dfs.sh 
Starting namenodes on [rzdatahadoop002]
rzdatahadoop002: starting namenode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-namenode-rzdatahadoop002.out
rzdatahadoop002: starting datanode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-datanode-rzdatahadoop002.out
Starting secondary namenodes [rzdatahadoop002]
rzdatahadoop002: starting secondarynamenode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-secondarynamenode-rzdatahadoop002.out
[hadoop@rzdatahadoop002 sbin]$ 

----------------------------------
补充: 
某个服务数据目录在A盘(500G),还剩10G。/a/dfs/data
添加B盘2T。
1.A盘:mv /a/dfs /b/
2.B盘:ln -s /b/dfs /a
3.检查(修改)A,B盘的文件夹的用户和用户组的权限

疑问:
1.多个机器的HDFS的数据目录,数据不均衡,怎么办?
2.一台机器的多个磁盘数据不均衡,怎么办?

20171213作业:
1.hadoop官网: http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/SingleCluster.html http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

2.hadoop部署--hdfs

3.将10,11做一下

4.拓展: 
hadoop001 hadoop002配置互相信任关系,说白了,不需要输入密码,进入对方执行命令 http://blog.itpub.net/30089851/viewspace-1992210/
more cpoy 换行符-->记事本

解决方案;
hadoop002:
scp id_rsa.pub hadoop001:/root/.ssh/id_rsa.pub002
hadoop001:

cat /root/.ssh/id_rsa.pub002 >> /root/.ssh/authorized_keys

java的latest问题 困扰一晚上已解决
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: