您的位置:首页 > 编程语言 > Java开发

Hadoop2.x实战:Eclipse本地开发环境搭建与本地运行wordcount实例

2016-06-28 08:54 871 查看
         林炳文Evankaka原创作品。转载请注明出处http://blog.csdn.net/evankaka
        摘要:本文主要讲了如何在windows上通过eclipse来远程运行hadoop应用程序
hadoop版本:2.7.2
hadoop安装系统:ubuntu14.04
eclipse版本:luna
eclipse安装系统:window64
一、开发环境配置
1、windows下载源码
http://hadoop.apache.org/releases.html 下载源码如下:


解压后放在一个地方:


2、安装hadoop-eclipse-plugin-2.7.2.jar(具体版本视你的hadoop版本而定)
下载地址:http://download.csdn.net/detail/tondayong1981/9432425

 首先把hadoop-eclipse-plugin-2.7.2.jar(具体版本视你的hadoop版本而定)放到eclipse安装目录的plugins文件夹中,如果重新打开eclipse后看到有如下视图,则说明你的hadoop插件已经安装成功了:


其中的“hadoop installation directory”配置项用于指向你的hadoop安装目录(就是上一步的hadoop解压后的目录),在windows下你只需要把下载到的hadoop-2.7.2.tar.gz包解压到某个位置,然后指向这个位置即可。
3、配置连接参数
选择windwos->show view->others


确定后在弹出的窗口中选择map/reduce locations,然后右键New Hadoop location


 配置eclipse中的Map/Reduce Locations,如下图所示

注意,这里的linmaster指代的是linux的地址:
其对应的关系在:C:\Windows\System32\drivers\etc\hosts
# Copyright (c) 1993-2009 Microsoft Corp.
#
# This is a sample HOSTS file used by Microsoft TCP/IP for Windows.
#
# This file contains the mappings of IP addresses to host names. Each
# entry should be kept on an individual line. The IP address should
# be placed in the first column followed by the corresponding host name.
# The IP address and the host name should be separated by at least one
# space.
#
# Additionally, comments (such as these) may be inserted on individual
# lines or following the machine name denoted by a '#' symbol.
#
# For example:
#
# 102.54.94.97 rhino.acme.com # source server
# 38.25.63.10 x.acme.com # x client host

# localhost name resolution is handled within DNS itself.
#127.0.0.1 localhost
#10.75.4.22 localhost
10.75.4.25 linmaster


一定在注意,这里的地址一定要和linux上的一样在ubuntu中输入ifconfig


其中Map/Reduce Master配置和 mapred-site.xml相关(注意linmaster指定IP地址,要在etc/hosts中配置)


DFS master配置和linux上的core-site.xml 相关(注意linmaster指定IP地址,要在etc/hosts中配置)


IP地址和名称映射配置vi  /etc/hosts 就可以添加 了


上面全部设置好了之后。先启动hadoop。然后在Eclipse资源管理器中,点击DFS Locations就可展开,说明连接成功。


二、本地运行hadoop应用1、创建一个maven项目,引入jar,POM.XML如下<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.func</groupId>
<artifactId>axc</artifactId>
<version>0.0.1-SNAPSHOT</version>
</parent>
<artifactId>axc-hadoop</artifactId>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.6</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
</dependencies>
</project>
编写一个wordCount程序:首先将hadoop安装在linux上的etc/hadoop下的文件放入到工程中。一定要注意是从你安装有Hadoop的linux机器 上拷贝下来的!


然后编写WordCount代码package com.lin;

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount {

public static class WordCountMapper extends MapReduceBase implements Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one =new IntWritable(1);
private Text word =new Text();

public void map(Object key,Text value,OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
StringTokenizer itr = new StringTokenizer(value.toString());
while(itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word,one);//字符解析成key-value,然后再发给reducer
}

}
}

public static class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result =new IntWritable();

public void reduce(Text key, Iterator<IntWritable>values, OutputCollector<Text, IntWritable> output, Reporter reporter)throws IOException {
int sum = 0;
while (values.hasNext()){//key相同的map会被发送到同一个reducer,所以通过循环来累加
sum +=values.next().get();
}
result.set(sum);
output.collect(key, result);//结果写到hdfs
}

}

public static void main(String[] args)throws Exception {
//System.setProperty("hadoop.home.dir", "D:\\project\\hadoop-2.7.2");
String input = "hdfs://linmaster:9000/input/LICENSE.txt";//要确保linux上这个文件存在
String output = "hdfs://linmaster:9000/output14/";

JobConf conf = new JobConf(WordCount.class);
conf.setJobName("WordCount");
conf.addResource("classpath:/hadoop/core-site.xml");
conf.addResource("classpath:/hadoop/hdfs-site.xml");
conf.addResource("classpath:/hadoop/mapred-site.xml");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(WordCountMapper.class);
conf.setCombinerClass(WordCountReducer.class);
conf.setReducerClass(WordCountReducer.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf,new Path(input));
FileOutputFormat.setOutputPath(conf,new Path(output));

JobClient.runJob(conf);
System.exit(0);
}

}输出结果:SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/Java/Repository/org/slf4j/slf4j-log4j12/1.7.5/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/Java/Repository/ch/qos/logback/logback-classic/1.0.13/logback-classic-1.0.13.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/06/19 16:12:45 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/06/19 16:12:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/06/19 16:12:45 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/06/19 16:12:46 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/06/19 16:12:46 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
16/06/19 16:12:46 INFO mapred.FileInputFormat: Total input paths to process : 1
16/06/19 16:12:46 INFO mapreduce.JobSubmitter: number of splits:1
16/06/19 16:12:46 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
16/06/19 16:12:46 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
16/06/19 16:12:46 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
16/06/19 16:12:46 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
16/06/19 16:12:46 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
16/06/19 16:12:46 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/06/19 16:12:46 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
16/06/19 16:12:46 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
16/06/19 16:12:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local536235441_0001
16/06/19 16:12:46 WARN conf.Configuration: file:/usr/hadoop/hadoop-2.7.2/tmp/mapred/staging/linbingwen536235441/.staging/job_local536235441_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
16/06/19 16:12:46 WARN conf.Configuration: file:/usr/hadoop/hadoop-2.7.2/tmp/mapred/staging/linbingwen536235441/.staging/job_local536235441_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
16/06/19 16:12:46 WARN conf.Configuration: file:/usr/hadoop/hadoop-2.7.2/tmp/mapred/local/localRunner/linbingwen/job_local536235441_0001/job_local536235441_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
16/06/19 16:12:46 WARN conf.Configuration: file:/usr/hadoop/hadoop-2.7.2/tmp/mapred/local/localRunner/linbingwen/job_local536235441_0001/job_local536235441_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
16/06/19 16:12:46 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 16/06/19 16:12:46 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/06/19 16:12:47 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/06/19 16:12:47 INFO mapreduce.Job: Running job: job_local536235441_0001
16/06/19 16:12:47 INFO mapred.LocalJobRunner: Waiting for map tasks
16/06/19 16:12:47 INFO mapred.LocalJobRunner: Starting task: attempt_local536235441_0001_m_000000_0
16/06/19 16:12:47 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
16/06/19 16:12:47 INFO mapred.Task: Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@12e29f36
16/06/19 16:12:47 INFO mapred.MapTask: Processing split: hdfs://linmaster:9000/input/LICENSE.txt:0+15429
16/06/19 16:12:47 INFO mapred.MapTask: numReduceTasks: 1
16/06/19 16:12:47 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/06/19 16:12:47 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/06/19 16:12:47 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/06/19 16:12:47 INFO mapred.MapTask: soft limit at 83886080
16/06/19 16:12:47 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/06/19 16:12:47 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/06/19 16:12:47 INFO mapred.LocalJobRunner:
16/06/19 16:12:47 INFO mapred.MapTask: Starting flush of map output
16/06/19 16:12:47 INFO mapred.MapTask: Spilling map output
16/06/19 16:12:47 INFO mapred.MapTask: bufstart = 0; bufend = 22735; bufvoid = 104857600
16/06/19 16:12:47 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26205772(104823088); length = 8625/6553600
16/06/19 16:12:48 INFO mapreduce.Job: Job job_local536235441_0001 running in uber mode : false
16/06/19 16:12:48 INFO mapreduce.Job: map 0% reduce 0%
16/06/19 16:12:48 INFO mapred.MapTask: Finished spill 0
16/06/19 16:12:48 INFO mapred.Task: Task:attempt_local536235441_0001_m_000000_0 is done. And is in the process of committing
16/06/19 16:12:48 INFO mapred.LocalJobRunner: hdfs://linmaster:9000/input/LICENSE.txt:0+15429
16/06/19 16:12:48 INFO mapred.Task: Task 'attempt_local536235441_0001_m_000000_0' done.
16/06/19 16:12:48 INFO mapred.LocalJobRunner: Finishing task: attempt_local536235441_0001_m_000000_0
16/06/19 16:12:48 INFO mapred.LocalJobRunner: Map task executor complete.
16/06/19 16:12:48 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
16/06/19 16:12:48 INFO mapred.Task: Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@6f0c29fe
16/06/19 16:12:48 INFO mapred.Merger: Merging 1 sorted segments
16/06/19 16:12:48 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 10982 bytes
16/06/19 16:12:48 INFO mapred.LocalJobRunner:
16/06/19 16:12:48 INFO mapred.Task: Task:attempt_local536235441_0001_r_000000_0 is done. And is in the process of committing
16/06/19 16:12:48 INFO mapred.LocalJobRunner:
16/06/19 16:12:48 INFO mapred.Task: Task attempt_local536235441_0001_r_000000_0 is allowed to commit now
16/06/19 16:12:48 INFO output.FileOutputCommitter: Saved output of task 'attempt_local536235441_0001_r_000000_0' to hdfs://linmaster:9000/output14/_temporary/0/task_local536235441_0001_r_000000
16/06/19 16:12:48 INFO mapred.LocalJobRunner: reduce > reduce
16/06/19 16:12:48 INFO mapred.Task: Task 'attempt_local536235441_0001_r_000000_0' done.
16/06/19 16:12:49 INFO mapreduce.Job: map 100% reduce 100%
16/06/19 16:12:49 INFO mapreduce.Job: Job job_local536235441_0001 completed successfully
16/06/19 16:12:49 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=11306
FILE: Number of bytes written=409424
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=30858
HDFS: Number of bytes written=8006
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=289
Map output records=2157
Map output bytes=22735
Map output materialized bytes=10992
Input split bytes=91
Combine input records=2157
Combine output records=755
Reduce input groups=755
Reduce shuffle bytes=0
Reduce input records=755
Reduce output records=755
Spilled Records=1510
Shuffled Maps =0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=514588672
File Input Format Counters
Bytes Read=15429
File Output Format Counters
Bytes Written=8006
表明程序运行成功了!

三、问题及解决方法在程序运行成功之前,你肯定会遇到如下问题:1、
org.apache.hadoop.security.AccessControlException: Permission denied: user=linbingwen, access=WRITE, inode="/":hadoop:supergroup:drwxr-xr-x
解决办法:
在 hdfs-site.xml 添加参数:
<property>
<name>dfs.permissions</name>
<value>false</value>
</property> 2、org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /output12/_temporary/0. Name node is in safe mode.
The reported blocks 17 has reached the threshold 0.9990 of total blocks 17. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 4 seconds.解决方法
bin/Hadoop dfsadmin -safemode leave
3、 Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (无法定位登录配置)
解决方法:
C:\Windows\System32\drivers\etc 下hosts文件添加.这个要根据你linux上机器的配置不同而变化
10.75.4.25    linmaster4、
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
分析:
下载Hadoop2以上版本时,在Hadoop2的bin目录下没有winutils.exe
解决方法:
下载https://codeload.github.com/srccodes/hadoop-common-2.2.0-bin/zip/master下载hadoop-common-2.2.0-bin-master.zip,然后解压后,
把hadoop-common-2.2.0-bin-master下的bin全部复制放到我们下载的Hadoop2的binHadoop2/bin目录下。如图所示:


设置HADOOP_HOME路径如果HADOOP_HOME为空,必然fullExeName为null\bin\winutils.exe。解决方法很简单,配置环境变量




注意, 一定要重启电脑 才能生效。注意, 一定要重启电脑 才能生效。注意, 一定要重启电脑 才能生效。不想重启电脑可以在程序里加上:
System.setProperty("hadoop.home.dir", "E:\\Program Files\\hadoop-2.7.0");
注:E:\\Program Files\\hadoop-2.7.0是我本机解压的hadoop的路径。

5、
Exception in thread "main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
分析:
C:\Windows\System32下缺少hadoop.dll,把这个文件拷贝到C:\Windows\System32下面即可。
解决:
hadoop-common-2.2.0-bin-master下的bin的hadoop.dll放到C:\Windows\System32下,笔者没有重启电脑就可以了
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: