您的位置：首页 > 运维架构

新装的hadoop2版本无法运行mapreduce的解决方法

2017-05-03 18:14 417 查看

在hadoop用户下执行hadoop classpath命令，我们可以得到运行 Hadoop 程序所需的全部 classpath 信息。
然后vi .bashrc（Debian版本，Redhat版本下是.bash_profile文件）添加：
export CLASSPATH=.:/home/hadoop/hadoop-2.6.0-cdh5.5.2/etc/hadoop:/home/hadoop/hadoop-2.6.0-cdh5.5.2/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.6.0-cdh5.5.2/share/hadoop/common/*:/home/hadoop/hadoop-2.6.0-cdh5.5.2/share/hadoop/hdfs:/home/hadoop/hadoop-2.6.0-cdh5.5.2/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.6.0-cdh5.5.2/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.6.0-cdh5.5.2/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.6.0-cdh5.5.2/share/hadoop/yarn/*:/home/hadoop/hadoop-2.6.0-cdh5.5.2/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.6.0-cdh5.5.2/share/hadoop/mapreduce/*:/home/hadoop/hadoop-2.6.0-cdh5.5.2/contrib/capacity-scheduler/*.jar
即可解决

使配置文件生效命令：
source .bashrc

运行一个简单的mr:
[hadoop@h40 ~]$ vi WordCount.java

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount {

public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}

public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}

运行方法一：
[hadoop@h40 ~]$ /usr/jdk1.7.0_25/bin/javac WordCount.java
（还必须在执行完source .bash_profile的窗口或者在新的窗口执行上面的命令，如果在未执行source .bash_profile命令的窗口则编译会报错）
[hadoop@h40 ~]$ /usr/jdk1.7.0_25/bin/jar cvf xx.jar WordCount*class
added manifest
adding: WordCount.class(in = 1516) (out= 744)(deflated 50%)
adding: WordCount$Map.class(in = 1918) (out= 794)(deflated 58%)
adding: WordCount$Reduce.class(in = 1591) (out= 642)(deflated 59%)

运行方法二：
在myeclipse中创建该项目，项目右键导出为jar文件，命名为xx.jar。将xx.jar上传到虚拟机Master主机中，虚拟机安装了VMWare Tools可以直接拖拽进行复制。

[hadoop@h40 ~]$ vi he.txt
hello world
hello hadoop
hello hive
[hadoop@h40 ~]$ hadoop fs -mkdir /input
[hadoop@h40 ~]$ hadoop fs -put he.txt /input
[hadoop@h40 ~]$ hadoop jar xx.jar WordCount /input/he.txt /output（这个output目录不能存在）
[hadoop@h40 ~]$ hadoop fs -cat /output/part-00000
hadoop 1
hello 3
hive 1
world 1

在网页中查看Job的具体信息，hadoop2版本中使用MapReduce JobHistory Server，http://jhs_host:port/，端口号默认为19888，地址由参数mapreduce.jobhistory.webapp.address配置管理

[hadoop@h40 ~]$ mapred historyserver

（在CRT端启动，自己不会退出，按Ctrl+c退出则页面也就无法加载）

补充：也可以运行Hadoop自带的Wordcount程序mkdir input
echo "Hello Docker" >input/file2.txt
echo "Hello Hadoop" >input/file1.txt

# create input directory on HDFS
hadoop fs -mkdir -p input

# put input files to HDFS
hdfs dfs -put ./input/* input

# run wordcount
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.6.0-sources.jar org.apache.hadoop.examples.WordCount input output

# print the input files
echo -e "\ninput file1.txt:"
hdfs dfs -cat input/file1.txt

echo -e "\ninput file2.txt:"
hdfs dfs -cat input/file2.txt

# print the output of wordcount
echo -e "\nwordcount output:"
hdfs dfs -cat output/part-r-00000

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航