您的位置:首页 > 运维架构

hadoop权威指南上 天气例子测试运行

2017-10-15 17:25 183 查看
一、先代码准备好。 代码在本文后面

我的hadoop路劲是/Users/chenxun/software/hadoop-2.8.1 所以我在这个建了个自己文件夹myclass目录,把代码放到这个目录下面。如图所示:

[chenxun@chen.local 17:21 ~/software/hadoop-2.8.1/myclass]$ll
total 64
-rw-r--r--  1 chenxun  staff  1017 10 15 15:36 MaxTemperature.java
-rw-r--r--  1 chenxun  staff   977 10 15 15:39 MaxTemperatureMapper.java
-rw-r--r--  1 chenxun  staff   579 10 15 15:39 MaxTemperatureReducer.java


二、配置代码编译环境classpath的值

配置好java环境和hadoop编译需要的hadoop依赖jar包

vim ~/.bash_profile

JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home
CLASSPAHT=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export HADOOP_HOME=/Users/chenxun/software/hadoop-2.8.1
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

for f in $HADOOP_HOME/share/hadoop/common/hadoop-*.jar;do
export CLASSPATH=$CLASSPATH:$f
done
for f in $HADOOP_HOME/share/hadoop/hdfs/hadoop-*.jar;do
export CLASSPATH=$CLASSPATH:$f
done
for f in $HADOOP_HOME/share/hadoop/mapreduce/hadoop-*.jar;do
export CLASSPATH=$CLASSPATH:$f
done
for f in $HADOOP_HOME/share/hadoop/yarn/hadoop-*.jar;do
export CLASSPATH=$CLASSPATH:$f
done

export CLASSPATH=$CLASSPATH:$HADOOP_HOME/share/common/lib:$HADOOP_HOME/share/hdfs/lib:$HADOOP_HOME/share/mapreduce/lib:$HADOOP_HOME/share/tools/lib:$HADOOP_HOME/share/yarn/lib

source ~/.bash_profile


三、编译代码和打包成jar包

javac *.java

jar -cvf MaxTemperature.jar .

[chenxun@chen.local 17:21 ~/software/hadoop-2.8.1/myclass]$ll
total 64
-rw-r--r--  1 chenxun  staff  1413 10 15 15:40 MaxTemperature.class
-rw-r--r--  1 chenxun  staff  6333 10 15 16:18 MaxTemperature.jar
-rw-r--r--  1 chenxun  staff  1017 10 15 15:36 MaxTemperature.java
-rw-r--r--  1 chenxun  staff  1876 10 15 15:40 MaxTemperatureMapper.class
-rw-r--r--  1 chenxun  staff   977 10 15 15:39 MaxTemperatureMapper.java
-rw-r--r--  1 chenxun  staff  1687 10 15 15:40 MaxTemperatureReducer.class
-rw-r--r--  1 chenxun  staff   579 10 15 15:39 MaxTemperatureReducer.java


四、准备数据

在网站下载hadoop天气数据:ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2010/

我把天气数据放到file.txt中:数据如下

0029227070999991901122820004+62167+030650FM-12+010299999V0200501N003119999999N0000001N9-01561+99999100061ADDGF108991999999999999999999

0029227070999991901122906004+62167+030650FM-12+010299999V0200901N003119999999N0000001N9-01501+99999100181ADDGF108991999999999999999999

0029227070999991901122913004+62167+030650FM-12+010299999V0200701N002119999999N0000001N9-01561+99999100271ADDGF104991999999999999999999

0029227070999991901122920004+62167+030650FM-12+010299999V0200701N002119999999N0000001N9-02001+99999100501ADDGF107991999999999999999999

0029227070999991901123006004+62167+030650FM-12+010299999V0200701N003119999999N0000001N9-01501+99999100791ADDGF108991999999999999999999

0029227070999991901123013004+62167+030650FM-12+010299999V0200901N003119999999N0000001N9-01331+99999100901ADDGF108991999999999999999999

0029227070999991901123020004+62167+030650FM-12+010299999V0200701N002119999999N0000001N9-01221+99999100831ADDGF108991999999999999999999

0029227070999991901123106004+62167+030650FM-12+010299999V0200701N004119999999N0000001N9-01391+99999100521ADDGF108991999999999999999999

0029227070999991901123113004+62167+030650FM-12+010299999V0200701N003119999999N0000001N9-01391+99999100321ADDGF108991999999999999999999

0029227070999991901123120004+62167+030650FM-12+010299999V0200701N004119999999N0000001N9-01391+99999100281ADDGF108991999999999999999999

建立hdfs数据输入文件路劲

[chenxun@chen.local 16:42 ~/software/hadoop-2.8.1/myclass]$hadoop fs -mkdir -p /user/chenxun/data
[chenxun@chen.local 16:42 ~/software/hadoop-2.8.1/myclass]$hadoop fs -ls /user/chenxun
Found 3 items
drwxr-xr-x   - chenxun supergroup          0 2017-10-15 16:42 /user/chenxun/data
drwxr-xr-x   - chenxun supergroup          0 2017-10-14 01:54 /user/chenxun/input
drwxr-xr-x   - chenxun supergroup          0 2017-10-14 01:55 /user/chenxun/output


把天气数据上传到数据输入路劲下面:

[chenxun@chen.local 16:47 ~/software/hadoop-2.8.1/myclass]$hadoop fs -put ./data/file.txt /user/chenxun/data
[chenxun@chen.local 16:47 ~/software/hadoop-2.8.1/myclass]$hadoop fs -ls /user/chenxun/data
Found 1 items
-rw-r--r--   1 chenxun supergroup       9855 2017-10-15 16:47 /user/chenxun/data/file.txt


运行代码:

[chenxun@chen.local 17:10 ~/software/hadoop-2.8.1/myclass]$hadoop jar MaxTemperature.jar  MaxTemperature /user/chenxun/data/file.txt /user/chenxun/dataoutput
。。。
。。。。。
[chenxun@chen.local 17:11 ~/software/hadoop-2.8.1/myclass]$hadoop fs -ls /user/chenxun/dataoutput
Found 2 items
-rw-r--r--   1 chenxun supergroup          0 2017-10-15 17:11 /user/chenxun/dataoutput/_SUCCESS
-rw-r--r--   1 chenxun supergroup          9 2017-10-15 17:11 /user/chenxun/dataoutput/part-r-00000
[chenxun@chen.local 17:11 ~/software/hadoop-2.8.1/myclass]$
[chenxun@chen.local 17:11 ~/software/hadoop-2.8.1/myclass]$
[chenxun@chen.local 17:12 ~/software/hadoop-2.8.1/myclass]$hadoop fs -cat /user/chenxun/dataoutput/part-r-00000
1901    -56


代码:

MaxTemperature.java

import java.io.IOException;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemperature {

public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path> <output path>");
System.exit(-1);
}

Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}


MaxTemperatureMapper.java

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MaxTemperatureMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {

private static final int MISSING = 9999;

@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {

String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}


MaxTemperatureReducer.java

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {

@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {

int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: