使用python hdfs模块操作hadoop hdfs的一个坑
2018-02-02 22:09
2126 查看
在进行webhdfs操作的时候,使用如下代码
获取远程/tmp目录下的文件列表(这一步只是从NameNode上进行获取)
当上传文件的时候调用upload方法
出现如下异常信息
经过各种测试,应该是上传文件需要连接到DataNode节点去写数据,
需要从hdfs的客户机(运行hdfs.upload代码的机器)需要与各个DataNode节点保持网络通畅
如果你的hdfs集群采用域名的方式,那么需要可以在DNS服务器上进行配置,或修改客户机本地的映射,文件是/etc/hosts(linux)或C:\windows\system32\drivers\etc\hosts文件中增加IP地址和域名的对应管理
from hdfs import InsecureClient client = InsecureClient('http://host:port', user='ann')
获取远程/tmp目录下的文件列表(这一步只是从NameNode上进行获取)
# Listing all files inside a directory. list_content = client.list('/tmp')
当上传文件的时候调用upload方法
client.upload(remote_dir,local_dir,overwriten=True)
出现如下异常信息
[E 180201 14:21:10 client:599] Error while uploading. Attempting cleanup. Traceback (most recent call last): File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 141, in _new_conn (self.host, self.port), self.timeout, **extra_kw) File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\util\connection.py", line 61, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "C:\Program Files (x86)\Python36-32\lib\socket.py", line 743, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno 11001] getaddrinfo failed During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 594, in upload _upload(path_tuple) File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 524, in _upload self.write(_temp_path, wrap(reader, chunk_size, progress), **kwargs) File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 470, in write consumer(data) File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 464, in consumer data=(c.encode(encoding) for c in _data) if encoding else _data, File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 207, in _request **kwargs File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 488, in request resp = self.send(prep, **send_kwargs) File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 609, in send r = adapter.send(request, **kwargs) File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\adapters.py", line 441, in send low_conn.endheaders() File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 1234, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 1026, in _send_output self.send(msg) File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 964, in send self.connect() File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 166, in connect conn = self._new_conn() File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 150, in _new_conn self, "Failed to establish a new connection: %s" % e) requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x046C9BD0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed [I 180201 14:21:10 client:848] Deleting '/tmp/data1/tensorflow' recursively. [E 180201 14:21:10 web:1548] Uncaught exception POST /api/job/create (127.0.0.1) HTTPServerRequest(protocol='http', host='localhost:8081', method='POST', uri='/api/job/create', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Connection': 'close', 'Cookie': 'Pycharm-c9b2eeaf=d1c21794-2128-4ae7-9a97-2f9a04f8749c', 'Content-Length': '34', 'Referer': 'http://localhost:8082/', 'Content-Type': 'application/json;charset=utf-8', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3', 'Accept': 'application/json, text/plain, */*', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0', 'Host': 'localhost:8081'}) Traceback (most recent call last): File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 141, in _new_conn (self.host, self.port), self.timeout, **extra_kw) File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\util\connection.py", line 61, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "C:\Program Files (x86)\Python36-32\lib\socket.py", line 743, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno 11001] getaddrinfo failed During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Program Files (x86)\Python36-32\lib\site-packages\tornado\web.py", line 1469, in _execute result = yield result File "C:\Program Files (x86)\Python36-32\lib\site-packages\tornado\gen.py", line 1015, in run value = future.result() File "C:\Program Files (x86)\Python36-32\lib\site-packages\tornado\concurrent.py", line 237, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info File "C:\Program Files (x86)\Python36-32\lib\site-packages\tornado\gen.py", line 1024, in run yielded = self.gen.send(value) File "app.py", line 79, in post hdfs_client.upload(remote_hdfs_model_dir,model_dir,overwrite=True) File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 605, in upload raise err File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 594, in upload _upload(path_tuple) File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 524, in _upload self.write(_temp_path, wrap(reader, chunk_size, progress), **kwargs) File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 470, in write consumer(data) File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 464, in consumer data=(c.encode(encoding) for c in _data) if encoding else _data, File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 207, in _request **kwargs File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 488, in request resp = self.send(prep, **send_kwargs) File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 609, in send r = adapter.send(request, **kwargs) File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\adapters.py", line 441, in send low_conn.endheaders() File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 1234, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 1026, in _send_output self.send(msg) File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 964, in send self.connect() File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 166, in connect conn = self._new_conn() File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 150, in _new_conn self, "Failed to establish a new connection: %s" % e) requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x046C9BD0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed [E 180201 14:21:10 web:1971] 500 POST /api/job/create (127.0.0.1) 128.01ms [I 180201 14:23:35 autoreload:204] C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\util\connection.py modified; restarting server 169.24.2.194 50070 bigdata6.chinasws.com 50075
经过各种测试,应该是上传文件需要连接到DataNode节点去写数据,
需要从hdfs的客户机(运行hdfs.upload代码的机器)需要与各个DataNode节点保持网络通畅
如果你的hdfs集群采用域名的方式,那么需要可以在DNS服务器上进行配置,或修改客户机本地的映射,文件是/etc/hosts(linux)或C:\windows\system32\drivers\etc\hosts文件中增加IP地址和域名的对应管理
相关文章推荐
- Hadoop2.6.0使用Python操作HDFS的解决方案
- 关于python使用hadoop(使用python操作hdfs)
- 使用python来访问Hadoop HDFS存储实现文件的操作
- 使用python来访问Hadoop HDFS存储实现文件的操作
- 如何使用Python为Hadoop编写一个简单的MapReduce程序(这个人T字还有好几篇精华的可以看)
- Hadoop初学指南(4)--使用java操作HDFS
- 使用Hadoop提供的API操作HDFS
- 如何使用Python为Hadoop编写一个简单的MapReduce程序
- Hadoop上路_06-在Ubuntu中使用eclipse操作HDFS
- 每天一个python小程序 2使用python.MySQLdb执行数据库操作
- 使用python+django+twistd 开发自己的操作和维护系统的一个
- python 文件处理模块使用,对一个文件追加写入
- 使用HDFS API实现hadoop HDFS文件系统的基本操作
- python中的一个好用的文件名操作模块glob
- Python使用cx_Oracle模块连接操作Oracle数据库
- Python使用xlrd模块操作Excel数据导入的方法
- hadoop学习笔记1.使用shell和JAVA API操作HDFS
- 今天犯了一个很蠢的错误,有关python的模块包的使用的
- 使用python读写操作同一个excel(xlrd,xlwt,xlutils)
- Python 使用xlwt模块操作Excel写