您的位置:首页 > 大数据 > Hadoop

使用python hdfs模块操作hadoop hdfs的一个坑

2018-02-02 22:09 2126 查看
在进行webhdfs操作的时候,使用如下代码

from hdfs import InsecureClient
client = InsecureClient('http://host:port', user='ann')

获取远程/tmp目录下的文件列表(这一步只是从NameNode上进行获取)

# Listing all files inside a directory.
list_content = client.list('/tmp')

当上传文件的时候调用upload方法

client.upload(remote_dir,local_dir,overwriten=True)

出现如下异常信息

[E 180201 14:21:10 client:599] Error while uploading. Attempting cleanup.
Traceback (most recent call last):
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\util\connection.py", line 61, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "C:\Program Files (x86)\Python36-32\lib\socket.py", line 743, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 594, in upload
_upload(path_tuple)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 524, in _upload
self.write(_temp_path, wrap(reader, chunk_size, progress), **kwargs)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 470, in write
consumer(data)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 464, in consumer
data=(c.encode(encoding) for c in _data) if encoding else _data,
File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 207, in _request
**kwargs
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\adapters.py", line 441, in send
low_conn.endheaders()
File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 1026, in _send_output
self.send(msg)
File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 964, in send
self.connect()
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 166, in connect
conn = self._new_conn()
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x046C9BD0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed
[I 180201 14:21:10 client:848] Deleting '/tmp/data1/tensorflow' recursively.
[E 180201 14:21:10 web:1548] Uncaught exception POST /api/job/create (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:8081', method='POST', uri='/api/job/create', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Connection': 'close', 'Cookie': 'Pycharm-c9b2eeaf=d1c21794-2128-4ae7-9a97-2f9a04f8749c', 'Content-Length': '34', 'Referer': 'http://localhost:8082/', 'Content-Type': 'application/json;charset=utf-8', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3', 'Accept': 'application/json, text/plain, */*', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0', 'Host': 'localhost:8081'})
Traceback (most recent call last):
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\util\connection.py", line 61, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "C:\Program Files (x86)\Python36-32\lib\socket.py", line 743, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Program Files (x86)\Python36-32\lib\site-packages\tornado\web.py", line 1469, in _execute
result = yield result
File "C:\Program Files (x86)\Python36-32\lib\site-packages\tornado\gen.py", line 1015, in run
value = future.result()
File "C:\Program Files (x86)\Python36-32\lib\site-packages\tornado\concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "C:\Program Files (x86)\Python36-32\lib\site-packages\tornado\gen.py", line 1024, in run
yielded = self.gen.send(value)
File "app.py", line 79, in post
hdfs_client.upload(remote_hdfs_model_dir,model_dir,overwrite=True)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 605, in upload
raise err
File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 594, in upload
_upload(path_tuple)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 524, in _upload
self.write(_temp_path, wrap(reader, chunk_size, progress), **kwargs)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 470, in write
consumer(data)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 464, in consumer
data=(c.encode(encoding) for c in _data) if encoding else _data,
File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 207, in _request
**kwargs
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\adapters.py", line 441, in send
low_conn.endheaders()
File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 1026, in _send_output
self.send(msg)
File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 964, in send
self.connect()
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 166, in connect
conn = self._new_conn()
File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x046C9BD0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed
[E 180201 14:21:10 web:1971] 500 POST /api/job/create (127.0.0.1) 128.01ms
[I 180201 14:23:35 autoreload:204] C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\util\connection.py modified; restarting server
169.24.2.194
50070
bigdata6.chinasws.com
50075


经过各种测试,应该是上传文件需要连接到DataNode节点去写数据,
需要从hdfs的客户机(运行hdfs.upload代码的机器)需要与各个DataNode节点保持网络通畅
如果你的hdfs集群采用域名的方式,那么需要可以在DNS服务器上进行配置,或修改客户机本地的映射,文件是/etc/hosts(linux)或C:\windows\system32\drivers\etc\hosts文件中增加IP地址和域名的对应管理
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: