WINDOWS 下 HADOOP 的安装
2014-03-23 20:11
363 查看
Windows hadoop installation
1.Install Cygwin2.Install Cygwin components:openssh,openssl,sed,subversion3.Add Cygwin/bin and Cygwin/usr/sbin to windows path4.Install sshdIn Cygwin, runssh-host-configShould privilege separation used (no)Do you want to install sshd as a service (yes)Cygwin will also prompt whether you want to create a new windows user to start the service, default user created is “cyg_server”, it is better to use the current domain user5.Config ssh loginIn Cygwin, run ssh-keygen6.Start sshd service in windowscontrol panel “service”Or call net start sshd, if the service failed to start, check \var\log\ssh.log7.Verify ssh loginIn Cygwin, run ssh localhost Sometimes the default port 22 is not good for usageWe can change port by modify file sshd_config:Port xxx, and change command to ssh localhost-p xxxFor detailed logs using ssh –v localhost8.Download and extract hadoop in afile folder9.Change JAVA_HOME in conf/hadoop-env.sh10.Test setupcp conf/*.xml inputbin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’
Problems encountered during installation
1.The first time, installsshd service failedI need to runsc delete sshd to delete the service and run ssh-host-config again
2.Error:Privilege separation user sshd does not exist Manually add the following linesshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin to file:“etc/passwd”
etc/pwd format:username:password:user id:group id:description:login main directory:shell nameWhen user logs in, a shell process is started to pass user input to kernel
3.Error:Connection closed by 1
If user A need to ssh connect to user B on host B,we need to copy A’s public key to a file called “authorized_keys” under host B’s“home/<user B>” folderCreate authorized_keys file:vi authorized_keysCopy public key to authorized_keys file:cat id_rsa.pub >> authorized_keys
For ssh, access right of .ssh folder and authorized_keys file need to be set correctlyChmod 700 /.sshChmod 600/.ssh/authorized_keys (we cannot grant write access to authorized_keys file)
4.Error: Starting hadoop: Java.io.IOException:failed to set permissions of path:\tmp\hadoop-jizhan\mapred\staging\jizhan…..\.staging
This problem occurs because of a compatibility problem in class org.apache.hadoop.fs.FileUtilWe need to manually change the method checkReturnValue,just log warn message instead of throw exception
Reference
http://bbym010.iteye.com/blog/1019653
Running Hadoop
1.Under stand-alone mode:Leave defaultconfigurationPut file to process directly under hadoop/input folder(no need for hadoop file system upload). Output file will be written to hadoop/output folder
2.Under pseudo-distributedmode:Core-site.xml<configuration> <property>
<name>fs.default.name</name>
<value>hdfs://localhost:9890</value>
</property>
</configuration>
Mapred-site.xml<configuration> <property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9891</value>
</property>
</configuration>
Hdfs-site.xml<configuration> <property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Make sure thatlocalhost is in master fileMake sure thatlocalhost is in slaves file
Problem encountered running in standalone mode
1.Reducer does not execute. There are a few things to check when encountering this problem
It is good to explicitly specify mapper and reducer’s output keyclass and valueactual mapper and reducer’s parameter type must match specification,mapper’s output parameter type must match reducer’s input parameter typeRaw Context object will not be accepted for map or reduce method,you need to use a strong typed context.Mapper<InputKey, InputValue,OutputKey, OutputValue>.ContextReducer<InputKey, InputValue,OutputKey, OutputValue>.Context
2.Line Reader does not readline correctly, a shorter line carries additional characters from previouslonger lineThis is due to a wrong way of using Text, a text has an internal byte array and an end index, so the Text object may contain additional data due to internal buffer expansion after reading a longer line, those chars will not be cleared and only chars before index should be read for a shorter line.Do not usenew String(text.getBytes())to convert text to string, usetext.toString()
Problem encountered running in pseudo-distributed mode
Error running map-reduce program 14/01/19 12:21:25 WARN mapred.JobClient: Error reading task output http://L-SHC-0436751.corp.ebay.com:50060/tasklog?plaintext=true&attemptid=attempt_20140119128_0002_m_000001_2&filter=stderr Hadoop uses unix file link to redirect output in {HADOOP_DIR}/logs to tmp/hadoop-jizhan/mapred/local(note that hadoop.tmp.dir-> tmp/hadoop-jizhan/)This is not recognized as a directory in windows by jdk and exception is thrown
To avoid redirection, we can set property HADOOP_LOG_DIR directly pointing to /tmp/mapred/local this is the Cygwin /tmp folder, and we need to use unix ln command to map it to local folder c:/tmp/hadoop-jizhan/mapred/local
1.Install Cygwin2.Install Cygwin components:openssh,openssl,sed,subversion3.Add Cygwin/bin and Cygwin/usr/sbin to windows path4.Install sshdIn Cygwin, runssh-host-configShould privilege separation used (no)Do you want to install sshd as a service (yes)Cygwin will also prompt whether you want to create a new windows user to start the service, default user created is “cyg_server”, it is better to use the current domain user5.Config ssh loginIn Cygwin, run ssh-keygen6.Start sshd service in windowscontrol panel “service”Or call net start sshd, if the service failed to start, check \var\log\ssh.log7.Verify ssh loginIn Cygwin, run ssh localhost Sometimes the default port 22 is not good for usageWe can change port by modify file sshd_config:Port xxx, and change command to ssh localhost-p xxxFor detailed logs using ssh –v localhost8.Download and extract hadoop in afile folder9.Change JAVA_HOME in conf/hadoop-env.sh10.Test setupcp conf/*.xml inputbin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’
Problems encountered during installation
1.The first time, installsshd service failedI need to runsc delete sshd to delete the service and run ssh-host-config again
2.Error:Privilege separation user sshd does not exist Manually add the following linesshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin to file:“etc/passwd”
etc/pwd format:username:password:user id:group id:description:login main directory:shell nameWhen user logs in, a shell process is started to pass user input to kernel
3.Error:Connection closed by 1
If user A need to ssh connect to user B on host B,we need to copy A’s public key to a file called “authorized_keys” under host B’s“home/<user B>” folderCreate authorized_keys file:vi authorized_keysCopy public key to authorized_keys file:cat id_rsa.pub >> authorized_keys
For ssh, access right of .ssh folder and authorized_keys file need to be set correctlyChmod 700 /.sshChmod 600/.ssh/authorized_keys (we cannot grant write access to authorized_keys file)
4.Error: Starting hadoop: Java.io.IOException:failed to set permissions of path:\tmp\hadoop-jizhan\mapred\staging\jizhan…..\.staging
This problem occurs because of a compatibility problem in class org.apache.hadoop.fs.FileUtilWe need to manually change the method checkReturnValue,just log warn message instead of throw exception
Reference
http://bbym010.iteye.com/blog/1019653
Running Hadoop
1.Under stand-alone mode:Leave defaultconfigurationPut file to process directly under hadoop/input folder(no need for hadoop file system upload). Output file will be written to hadoop/output folder
2.Under pseudo-distributedmode:Core-site.xml<configuration> <property>
<name>fs.default.name</name>
<value>hdfs://localhost:9890</value>
</property>
</configuration>
Mapred-site.xml<configuration> <property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9891</value>
</property>
</configuration>
Hdfs-site.xml<configuration> <property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Make sure thatlocalhost is in master fileMake sure thatlocalhost is in slaves file
Problem encountered running in standalone mode
1.Reducer does not execute. There are a few things to check when encountering this problem
It is good to explicitly specify mapper and reducer’s output keyclass and valueactual mapper and reducer’s parameter type must match specification,mapper’s output parameter type must match reducer’s input parameter typeRaw Context object will not be accepted for map or reduce method,you need to use a strong typed context.Mapper<InputKey, InputValue,OutputKey, OutputValue>.ContextReducer<InputKey, InputValue,OutputKey, OutputValue>.Context
2.Line Reader does not readline correctly, a shorter line carries additional characters from previouslonger lineThis is due to a wrong way of using Text, a text has an internal byte array and an end index, so the Text object may contain additional data due to internal buffer expansion after reading a longer line, those chars will not be cleared and only chars before index should be read for a shorter line.Do not usenew String(text.getBytes())to convert text to string, usetext.toString()
Problem encountered running in pseudo-distributed mode
Error running map-reduce program 14/01/19 12:21:25 WARN mapred.JobClient: Error reading task output http://L-SHC-0436751.corp.ebay.com:50060/tasklog?plaintext=true&attemptid=attempt_20140119128_0002_m_000001_2&filter=stderr Hadoop uses unix file link to redirect output in {HADOOP_DIR}/logs to tmp/hadoop-jizhan/mapred/local(note that hadoop.tmp.dir-> tmp/hadoop-jizhan/)This is not recognized as a directory in windows by jdk and exception is thrown
To avoid redirection, we can set property HADOOP_LOG_DIR directly pointing to /tmp/mapred/local this is the Cygwin /tmp folder, and we need to use unix ln command to map it to local folder c:/tmp/hadoop-jizhan/mapred/local
相关文章推荐
- hadoop-2.7 在windows环境下安装
- 在Windows下安装Hadoop遇到的几个问题
- Windows下安装Hadoop
- 在Windows下安装Hadoop环境遇到的问题
- windows下安装并启动hadoop2.7.2
- hadoop单机伪分布式安装(windows及centos)
- Hadoop在Windows下免Cygwin伪分布安装
- Windows下安装Hadoop
- windows 安装hadoop
- windows安装eclipse开发hadoop
- hadoop-2.7.3 在windows环境下安装(无需Cygwin)
- Hadoop在Windows下的安装配置
- Hadoop II Windows下安装hadoop2.6.0-eclipse-plugin插件
- Windows平台下安装Hadoop
- Windows平台下安装Eclipse插件,开发Hadoop应用
- windows下的hadoop安装
- windows安装hadoop
- 白手起家:一步一步安装 hadoop on windows
- hadoop在windows下安装运行
- windows下安装并启动hadoop2.6.1