您的位置:首页 > 理论基础 > 计算机网络

Alex 的 Hadoop 菜鸟教程: 第18课 用Http的方式访问HDFS - HttpFs 教程

2015-02-10 11:17 573 查看
原文地址: http://blog.csdn.net/nsrainbow/article/details/43678237  最新课程请关注原作者博客,获得更好的显示体验

声明

本文基于Centos 6.x + CDH 5.x

HttpFs 有啥用

HttpFs可以干这两件事情

通过HttpFs你可以在浏览器里面管理HDFS上的文件
HttpFs还提供了一套REST 风格的API可以用来管理HDFS
其实很简单的一个东西嘛,但是很实用

安装HttpFs

在集群里面找一台可以访问hdfs的机器安装HttpFs
$ sudo yum install hadoop-httpfs

配置

编辑/etc/hadoop/conf/core-site.xml
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>
这边是定义可以使用httpfs的用户组和host,写*就是不限制
配置好之后重启hadoop

启动HttpFs

$ sudo service hadoop-httpfs start

使用HttpFs

打开浏览器访问 http://host2:14000/webhdfs/v1?op=LISTSTATUS&user.name=httpfs 可以看到
{
"FileStatuses": {
"FileStatus": [{
"pathSuffix": "hbase",
"type": "DIRECTORY",
"length": 0,
"owner": "hbase",
"group": "hadoop",
"permission": "755",
"accessTime": 0,
"modificationTime": 1423446940595,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "tmp",
"type": "DIRECTORY",
"length": 0,
"owner": "hdfs",
"group": "hadoop",
"permission": "1777",
"accessTime": 0,
"modificationTime": 1423122488037,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "user",
"type": "DIRECTORY",
"length": 0,
"owner": "hdfs",
"group": "hadoop",
"permission": "755",
"accessTime": 0,
"modificationTime": 1423529997937,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "var",
"type": "DIRECTORY",
"length": 0,
"owner": "hdfs",
"group": "hadoop",
"permission": "755",
"accessTime": 0,
"modificationTime": 1422945036465,
"blockSize": 0,
"replication": 0
}]
}
}


这个 &user.name=httpfs 表示用默认用户 httpfs 访问,默认用户是没有密码的。

webhdfs/v1 这是HttpFs的根目录

访问 http://host2:14000/webhdfs/v1/user?op=LISTSTATUS&user.name=httpfs 可以看到
{
"FileStatuses": {
"FileStatus": [{
"pathSuffix": "cloudera",
"type": "DIRECTORY",
"length": 0,
"owner": "root",
"group": "hadoop",
"permission": "755",
"accessTime": 0,
"modificationTime": 1423472508868,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "hdfs",
"type": "DIRECTORY",
"length": 0,
"owner": "hdfs",
"group": "hadoop",
"permission": "700",
"accessTime": 0,
"modificationTime": 1422947019504,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "history",
"type": "DIRECTORY",
"length": 0,
"owner": "mapred",
"group": "hadoop",
"permission": "1777",
"accessTime": 0,
"modificationTime": 1422945692887,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "hive",
"type": "DIRECTORY",
"length": 0,
"owner": "hive",
"group": "hadoop",
"permission": "755",
"accessTime": 0,
"modificationTime": 1423123187569,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "hive_people",
"type": "DIRECTORY",
"length": 0,
"owner": "root",
"group": "hadoop",
"permission": "755",
"accessTime": 0,
"modificationTime": 1423216966453,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "hive_people2",
"type": "DIRECTORY",
"length": 0,
"owner": "root",
"group": "hadoop",
"permission": "755",
"accessTime": 0,
"modificationTime": 1423222237254,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "impala",
"type": "DIRECTORY",
"length": 0,
"owner": "root",
"group": "hadoop",
"permission": "755",
"accessTime": 0,
"modificationTime": 1423475272189,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "root",
"type": "DIRECTORY",
"length": 0,
"owner": "root",
"group": "hadoop",
"permission": "700",
"accessTime": 0,
"modificationTime": 1423221719835,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "spark",
"type": "DIRECTORY",
"length": 0,
"owner": "spark",
"group": "spark",
"permission": "755",
"accessTime": 0,
"modificationTime": 1423530243396,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "sqoop",
"type": "DIRECTORY",
"length": 0,
"owner": "hdfs",
"group": "hadoop",
"permission": "755",
"accessTime": 0,
"modificationTime": 1423127462911,
"blockSize": 0,
"replication": 0
},
{
"pathSuffix": "test_hive",
"type": "DIRECTORY",
"length": 0,
"owner": "root",
"group": "hadoop",
"permission": "755",
"accessTime": 0,
"modificationTime": 1423215687891,
"blockSize": 0,
"replication": 0
}]
}
}


很奇怪的是HttpFs的文档很少,更具体的命令要去 WebHDFS的文档里面看 WebHDFS REST API  
支持的命令


Operations

HTTP GET
OPEN (see FileSystem.open)
GETFILESTATUS (see FileSystem.getFileStatus)
LISTSTATUS (see FileSystem.listStatus)
GETCONTENTSUMMARY (see FileSystem.getContentSummary)
GETFILECHECKSUM (see FileSystem.getFileChecksum)
GETHOMEDIRECTORY (see FileSystem.getHomeDirectory)
GETDELEGATIONTOKEN (see FileSystem.getDelegationToken)

HTTP PUT
CREATE (see FileSystem.create)
MKDIRS (see FileSystem.mkdirs)
RENAME (see FileSystem.rename)
SETREPLICATION (see FileSystem.setReplication)
SETOWNER (see FileSystem.setOwner)
SETPERMISSION (see FileSystem.setPermission)
SETTIMES (see FileSystem.setTimes)
RENEWDELEGATIONTOKEN (see
DistributedFileSystem.renewDelegationToken)
CANCELDELEGATIONTOKEN (see
DistributedFileSystem.cancelDelegationToken)

HTTP POST
APPEND (see FileSystem.append)

HTTP DELETE
DELETE (see FileSystem.delete)

建立文件夹

尝试建立一个叫 abc 的文件夹
[root@host2 hadoop-httpfs]# curl -i -X PUT "http://host2:14000/webhdfs/v1/user/abc?op=MKDIRS&user.name=httpfs"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth="u=httpfs&p=httpfs&t=simple&e=1423573951025&s=Ab44ha1Slg1f4xCrK+x4R/s1eMY="; Path=/; Expires=Tue, 10-Feb-2015 13:12:31 GMT; HttpOnly
Content-Type: application/json
Transfer-Encoding: chunked
Date: Tue, 10 Feb 2015 03:12:36 GMT

{"boolean":true}
然后用服务器上的hdfs dfs -ls 命令看下结果
[root@host2 conf]# hdfs dfs -ls /user
Found 12 items
drwxr-xr-x   - httpfs hadoop          0 2015-02-10 11:12 /user/abc
drwxr-xr-x   - root   hadoop          0 2015-02-09 17:01 /user/cloudera
drwx------   - hdfs   hadoop          0 2015-02-03 15:03 /user/hdfs
drwxrwxrwt   - mapred hadoop          0 2015-02-03 14:41 /user/history
drwxr-xr-x   - hive   hadoop          0 2015-02-05 15:59 /user/hive
drwxr-xr-x   - root   hadoop          0 2015-02-06 18:02 /user/hive_people
drwxr-xr-x   - root   hadoop          0 2015-02-06 19:30 /user/hive_people2
drwxr-xr-x   - root   hadoop          0 2015-02-09 17:47 /user/impala
drwx------   - root   hadoop          0 2015-02-06 19:21 /user/root
drwxr-xr-x   - spark  spark           0 2015-02-10 09:04 /user/spark
drwxr-xr-x   - hdfs   hadoop          0 2015-02-05 17:11 /user/sqoop
drwxr-xr-x   - root   hadoop          0 2015-02-06 17:41 /user/test_hive
可以看到建立了一个属于 httpfs 的文件夹 abc

打开文件

从后台上传一个文本文件 test.txt  到 /user/abc 目录下,内容是
Hello World!


用httpfs访问
[root@host2 hadoop-httpfs]# curl -i -X GET "http://host2:14000/webhdfs/v1/user/abc/test.txt?op=OPEN&user.name=httpfs"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth="u=httpfs&p=httpfs&t=simple&e=1423574166943&s=JTxqIJUsblVBeHVuTs6JCV2UbBs="; Path=/; Expires=Tue, 10-Feb-2015 13:16:06 GMT; HttpOnly
Content-Type: application/octet-stream
Content-Length: 13
Date: Tue, 10 Feb 2015 03:16:07 GMT

Hello World!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  hdfs cloudera hadoop