您的位置:首页 > 运维架构 > Linux

Centos7中安装hadoop-1.2.1

2016-01-02 21:53 471 查看
在Linux中存放软件压缩包和安装软件的目录说下

/opt/software/ --存放软件压缩包

/opt/modules/ --存放安装软件包

/opt/tools/ --存放工具包

/opt/data/ --存放测试数据

以上是我规定的目录,也可以自行设置软件的安装目录和存放软件压缩包

1、解压hadoop-1.2.1

tar -zxvf /opt/software/hadoop-1.2.1-bin.tar.gz


2、拷贝解压缩后的文件到modules目录

sudo cp hadoop-1.2.1 /opt/modules/


一、首先配置单机模式

1、配置jdk到hadoop-env.sh

vim hadoop-1.2.1/conf/hadoop-env.sh


加入如下内容

export J***A_HOME=/opt/modules/jdk1.7.0_79

自己安装的jdk路径

2、保存退出,使配置生效

source /opt/modules/hadoop-1.2.1/conf/hadoop-env.sh


3、在profile文件中增加hadoop的安装路径

vim /etc/profile


在配置文件中末尾追加以下内容

##HADOOP

export HADOOP_HOME=/opt/modules/hadoop-1.2.1

export PATH=$PATH:$HADOOP_HOME/bin

使配置生效

source /etc/profile


4、测试配置是否成功

hadoop


Warning: $HADOOP_HOME is deprecated.

Usage: hadoop [--config confdir] COMMAND

where COMMAND is one of:

namenode -format format the DFS filesystem

secondarynamenode run the DFS secondary namenode

namenode run the DFS namenode

datanode run a DFS datanode

dfsadmin run a DFS admin client

mradmin run a Map-Reduce admin client

fsck run a DFS filesystem checking utility

fs run a generic filesystem user client

balancer run a cluster balancing utility

oiv apply the offline fsimage viewer to an fsimage

fetchdt fetch a delegation token from the NameNode

jobtracker run the MapReduce job Tracker node

pipes run a Pipes job

tasktracker run a MapReduce task Tracker node

historyserver run job history servers as a standalone daemon

job manipulate MapReduce jobs

queue get information regarding JobQueues

version print the version

jar <jar> run a jar file

distcp <srcurl> <desturl> copy file or directories recursively

distcp2 <srcurl> <desturl> DistCp version 2

archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive

classpath prints the class path needed to get the

Hadoop jar and the required libraries

daemonlog get/set the log level for each daemon

or

CLASSNAME run the class named CLASSNAME

Most commands print help when invoked w/o parameters.

5、测试MapReduce是否正常

进入data目录

cd /opt/data/


6、创建input文件夹

sudo mkdir input
7、拷贝文件到这个目录

sudo cp /opt/hadoop-1.2.1/conf/*.xml input


8、执行hadoop中自带的例子

进入hadoop目录

cd /opt/modules/hadoop-1.2.1/


9、执行

bin/hadoop jar hadoop-examples-1.2.1.jar grep /opt/data/input/ /opt/data/output 'dfs[a-z.]+'


10、如果在output目录下出现比如

part-00000 _SUCCESS俩个文件,说明执行成功过

11、进入以下文件

more part-00000


1 dfsadmin

有以上内容,说明已经完全执行成功

二、接下来配置伪分布式

1、首先配置核心文件

vim conf/core-site.xml



在以上的“configuration”标签下加入以下图示内容



代码如下

<property><!--配置HDFS的地址和端口号-->
                <name>fs.default.name</name>
                <value>hdfs://master.dragon.org:9000</value>
        </property>
        <property><!--配置HDFS数据的存放目录-->
                <name>hadoop.tmp.dir</name>
                <value>/opt/data/tmp</value>
        </property>


2、接下来配置hdfs文件

vim conf/hdfs-site.xml



在以上的“configuration”标签下加入以下图示内容



代码如下

<property><!--配置HDFS的备份方式默认为3,在单机版的为1-->
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property><!--配置HDFS的验证,这里不让他验证-->
                <name>dfs.permissions</name>
                <value>false</value>
        </property>


3、配置MapReduce文件

vim conf/mapred-site.xml




在以上的“configuration”标签下加入以下图示内容



<property><!--配置MapReduce的访问地址-->
                <name>mapred.job.tracker</name>
                <value>master.dragon.org:9001</value>
        </property>


4、配置conf文件夹下的masters文件

vim conf/masters



写Linux配置的全限定名

5、配置conf文件夹下的slaves文件

vim conf/slaves



6、格式化namenode

bin/hadoop namenode -format


7、启动hadoop

bin/start-all.sh


8、用jps查看启动的进程,如果是五个说明就成功了。



9、然后再浏览器中输入master.dragon.org:50070,如果访问成功,出现如下界面



10、如果想要在windows下面访问hadoop,就要在hosts文件中增加Linux主机IP和主机名

11、测试

上传文件到hdfs文件系统中

首先在文件系统中创建存放文件的目录

hadoop fs -mkdir hdfs://master.dragon.org:9000/wc/input/


查看是否创建成功

hadoop fs -lsr hdfs://master.dragon.org:9000/


drwxr-xr-x - root supergroup 0 2016-01-05 17:18 /wc

drwxr-xr-x - root supergroup 0 2016-01-05 17:18 /wc/input

接着开始上传

hadoop fs -put conf/*.xml hdfs://master.dragon.org:9000/wc/input/


在查看是否上传成功

drwxr-xr-x - root supergroup 0 2016-01-05 17:18 /wc

drwxr-xr-x - root supergroup 0 2016-01-05 17:23 /wc/input

-rw-r--r-- 1 root supergroup 7457 2016-01-05 17:23 /wc/input/capacity-scheduler.xml

-rw-r--r-- 1 root supergroup 444 2016-01-05 17:23 /wc/input/core-site.xml

-rw-r--r-- 1 root supergroup 327 2016-01-05 17:23 /wc/input/fair-scheduler.xml

-rw-r--r-- 1 root supergroup 4644 2016-01-05 17:23 /wc/input/hadoop-policy.xml

-rw-r--r-- 1 root supergroup 331 2016-01-05 17:23 /wc/input/hdfs-site.xml

-rw-r--r-- 1 root supergroup 2033 2016-01-05 17:23 /wc/input/mapred-queue-acls.xml

-rw-r--r-- 1 root supergroup 276 2016-01-05 17:23 /wc/input/mapred-site.xml



下载用get命令,模仿以上

12、运行一个例子

hadoop jar hadoop-examples-1.2.1.jar wordcount hdfs://master.dragon.org:9000/wc/input/ hdfs://master.dragon.org:9000/wc/output/


16/01/05 17:32:11 INFO util.NativeCodeLoader: Loaded the native-hadoop library

16/01/05 17:32:11 INFO input.FileInputFormat: Total input paths to process : 7

16/01/05 17:32:11 WARN snappy.LoadSnappy: Snappy native library not loaded

16/01/05 17:32:12 INFO mapred.JobClient: Running job: job_201601051530_0001

16/01/05 17:32:13 INFO mapred.JobClient: map 0% reduce 0%

16/01/05 17:33:31 INFO mapred.JobClient: map 28% reduce 0%

16/01/05 17:33:54 INFO mapred.JobClient: map 57% reduce 0%

16/01/05 17:34:05 INFO mapred.JobClient: map 57% reduce 19%

16/01/05 17:34:23 INFO mapred.JobClient: map 85% reduce 19%

16/01/05 17:34:33 INFO mapred.JobClient: map 100% reduce 19%

16/01/05 17:34:35 INFO mapred.JobClient: map 100% reduce 28%

16/01/05 17:34:37 INFO mapred.JobClient: map 100% reduce 100%

16/01/05 17:34:39 INFO mapred.JobClient: Job complete: job_201601051530_0001

16/01/05 17:34:39 INFO mapred.JobClient: Counters: 29

16/01/05 17:34:39 INFO mapred.JobClient: Job Counters

16/01/05 17:34:39 INFO mapred.JobClient: Launched reduce tasks=1

16/01/05 17:34:39 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=229608

16/01/05 17:34:39 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

16/01/05 17:34:39 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

16/01/05 17:34:39 INFO mapred.JobClient: Launched map tasks=7

16/01/05 17:34:39 INFO mapred.JobClient: Data-local map tasks=7

16/01/05 17:34:39 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=65560

16/01/05 17:34:39 INFO mapred.JobClient: File Output Format Counters

16/01/05 17:34:39 INFO mapred.JobClient: Bytes Written=6564

16/01/05 17:34:39 INFO mapred.JobClient: FileSystemCounters

16/01/05 17:34:39 INFO mapred.JobClient: FILE_BYTES_READ=15681

16/01/05 17:34:39 INFO mapred.JobClient: HDFS_BYTES_READ=15512

16/01/05 17:34:39 INFO mapred.JobClient: FILE_BYTES_WRITTEN=451968

16/01/05 17:34:39 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=6564

16/01/05 17:34:39 INFO mapred.JobClient: File Input Format Counters

16/01/05 17:34:39 INFO mapred.JobClient: Bytes Read=15512

16/01/05 17:34:39 INFO mapred.JobClient: Map-Reduce Framework

16/01/05 17:34:39 INFO mapred.JobClient: Map output materialized bytes=10651

16/01/05 17:34:39 INFO mapred.JobClient: Map input records=386

16/01/05 17:34:39 INFO mapred.JobClient: Reduce shuffle bytes=10651

16/01/05 17:34:39 INFO mapred.JobClient: Spilled Records=1196

16/01/05 17:34:39 INFO mapred.JobClient: Map output bytes=21309

16/01/05 17:34:39 INFO mapred.JobClient: Total committed heap usage (bytes)=867921920

16/01/05 17:34:39 INFO mapred.JobClient: CPU time spent (ms)=16610

16/01/05 17:34:39 INFO mapred.JobClient: Combine input records=1761

16/01/05 17:34:39 INFO mapred.JobClient: SPLIT_RAW_BYTES=847

16/01/05 17:34:39 INFO mapred.JobClient: Reduce input records=598

16/01/05 17:34:39 INFO mapred.JobClient: Reduce input groups=427

16/01/05 17:34:39 INFO mapred.JobClient: Combine output records=598

16/01/05 17:34:39 INFO mapred.JobClient: Physical memory (bytes) snapshot=1215692800

16/01/05 17:34:39 INFO mapred.JobClient: Reduce output records=427

16/01/05 17:34:39 INFO mapred.JobClient: Virtual memory (bytes) snapshot=5859864576

16/01/05 17:34:39 INFO mapred.JobClient: Map output records=1761

一个任务正在运行



运行完之后



查看part-r-00000文件

hadoop fs -cat hdfs://master.dragon.org:9000/wc/output/part-r-00000


"*" 10

"alice,bob 10

' 2

'(i.e. 2

'*', 2

'default' 2

(maximum-system-jobs 2

* 2

, 2

--> 14

-1 1

100 1

25% 1

25. 1

33% 1

4 1

50% 1

<!-- 14

</allocations> 1

</configuration> 6

</description> 19

</property> 34

</value> 2

<?xml 7

<?xml-stylesheet 5

<allocations> 1

<configuration> 6

<description> 4

<description>ACL 10

<description>If 2

<description>Maximum 1

<description>Number 1

<description>Percentage 1

<description>The 10

<name>dfs.permissions</name> 1

<name>dfs.replication</name> 1

<name>fs.default.name</name> 1

<name>hadoop.tmp.dir</name> 1

<name>mapred.capacity-scheduler.default-init-accept-jobs-factor</name> 1

<name>mapred.capacity-scheduler.default-maximum-active-tasks-per-queue</name> 1

<name>mapred.capacity-scheduler.default-maximum-active-tasks-per-user</name> 1

<name>mapred.capacity-scheduler.default-minimum-user-limit-percent</name> 1

<name>mapred.capacity-scheduler.default-supports-priority</name> 1

<name>mapred.capacity-scheduler.default-user-limit-factor</name> 1

<name>mapred.capacity-scheduler.init-poll-interval</name> 1

<name>mapred.capacity-scheduler.init-worker-threads</name> 1

<name>mapred.capacity-scheduler.maximum-system-jobs</name> 1

<name>mapred.capacity-scheduler.queue.default.capacity</name> 1

<name>mapred.capacity-scheduler.queue.default.init-accept-jobs-factor</name> 1

<name>mapred.capacity-scheduler.queue.default.maximum-capacity</name> 1

<name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks-per-user</name> 1

<name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks</name> 1

<name>mapred.capacity-scheduler.queue.default.minimum-user-limit-percent</name> 1

<name>mapred.capacity-scheduler.queue.default.supports-priority</name> 1

<name>mapred.capacity-scheduler.queue.default.user-limit-factor</name> 1

<name>mapred.job.tracker</name> 1

<name>mapred.queue.default.acl-administer-jobs</name> 1

<name>mapred.queue.default.acl-submit-job</name> 1

<name>security.admin.operations.protocol.acl</name> 1

<name>security.client.datanode.protocol.acl</name> 1

<name>security.client.protocol.acl</name> 1

<name>security.datanode.protocol.acl</name> 1

<name>security.inter.datanode.protocol.acl</name> 1

<name>security.inter.tracker.protocol.acl</name> 1

<name>security.job.submission.protocol.acl</name> 1

<name>security.namenode.protocol.acl</name> 1

<name>security.refresh.policy.protocol.acl</name> 1

<name>security.task.umbilical.protocol.acl</name> 1

<property> 34

<value> 2

<value>*</value> 10

<value>-1</value> 1

<value>/opt/data/tmp</value> 1

<value>100000</value> 2

<value>100</value> 3

<value>10</value> 2

<value>1</value> 3

<value>200000</value> 2

<value>3000</value> 1

<value>5000</value> 1

<value>5</value> 1

<value>false</value> 3

<value>hdfs://master.dragon.org:9000</value> 1

<value>master.dragon.org:9001</value> 1

A 11

ACL 13

AdminOperationsProtocol, 1

By 1

Capacity 2

CapacityScheduler. 1

ClientDatanodeProtocol, 1

ClientProtocol, 1

Comma 2

DatanodeProtocol, 1

Default 1

DistributedFileSystem. 1

Each 1

Fair 2

For 13

Hadoop. 1

If 9

Initialization 2

InterDatanodeProtocol, 1

InterTrackerProtocol, 1

Irrespective 2

It 2

Its 1

Job 1

JobSubmissionProtocol, 1

JobTracker. 1

Map/Reduce 2

NamenodeProtocol, 1

Once 4

One 1

Put 4

RefreshAuthorizationPolicyProtocol, 1

Scheduler 1

Scheduler. 1

So 1

TaskUmbilicalProtocol, 1

The 29

This 6

With 1

You 1

a 49

above 2

absence 1

absolute 1

accepted 2

accordingly. 1

account 2

acls 1

acquire 1

across 4

added 1

administrators 2

affected. 1

all 21

allocated 1

allocations 1

allow 1

allowed 6

allowed.</description> 10

also 1

amount 2

and 32

any 2

applied 1

appropriate 1

are 21

as, 1

assigned. 1

at 4

authorization 2

available 1

based 1

be 20

being 1

between 1

beyond 1

blank. 12

block 1

by 28

can 18

cannot 1

capacity 8

capacity. 1

certain 2

change. 1

client-to-datanode 1

clients 1

cluster 6

cluster's 1

cluster, 1

cluster. 2

code 1

comma-separated 10

commands 2

communciate 1

communicate 4

competition 1

complete 1

concurrently, 1

concurrently. 5

config 1

configuration 6

configuration, 2

configuration. 1

configure 1

configured 3

consume 1

contains 1

convention,such 1

could 2

curtail 1

datanodes 1

decisions 1

decisions. 1

default 7

default, 1

default. 1

defines 1

depends 1

details, 1

determine 3

dfsadmin 1

disk. 4

do 4

documentation 2

don't 1

e.g. 12

enabled 2

enforces 1

equal 3

etc. 1

example, 1

exceed 4

excess 1

explained 1

file 3

file. 5

follow 1

for 24

format 1

former 1

from 1

generation 1

get 2

given 2

greater 2

group 24

group1,group2. 2

guarantees 1

have 3

his/her 1

how 1

href="configuration.xsl"?> 5
http://hadoop.apache.org/common/docs/r0.20.205.0/fair_scheduler.html. 1

if 4

implies 2

important 1

imposed. 1

in 27

in-effect. 1

includes 1

increase 1

initialize 2

initialize. 1

initialized 4

initialized, 1

inter-datanode 1

into 2

irrespective 1

is 39

it 2

its 1

job 11

job's 1

job, 1

jobs 23

jobs, 1

jobs. 3

jobtracker 1

jobtracker. 1

kill 1

large 1

latter 1

lead 1

lesser 1

limit 8

limit. 1

limited 1

limits 1

list 26

long 1

manager 1

map 1

mapred.acls.enabled 2

mapred.capacity-scheduler.queue.<queue-name>.property-name. 1

mapreduce.cluster.administrators 2

max 2

maximum 5

maximum-capacity 3

means 13

mentioned 1

miliseconds 1

minimum 2

modify 1

more 6

mradmin 1

mradmins 1

much 1

multipe 2

multiple 2

namenode 1

namenode. 2

names 2

names. 10

naming 1

nature 1

no 8

nodes 2

note 1

number 14

occupying 1

of 67

on 9

only 3

operation. 2

operations 2

or 4

other 1

overrides 4

owner 1

parameters 2

parent 1

particular 2

per-user, 2

percentage 4

point 1

policy 1

poll 1

poller 1

pool 1

pre-emption, 1

priorities 2

priority 1

properties 2

property 11

protocol 3

provides 1

querying 1

queue 12

queue's 3

queue, 5

queue-capacity 1

queue-capacity) 2

queue. 6

queued 4

queues 6

queues. 3

racks 1

recovery. 1

reduce 1

refresh 2

related 1

resource 1

resources 2

resources. 3

running 1

sample 1

scheduler 2

scheduler. 2

scheduling 3

secondary 1

security 1

separated 14

set 7

setting 2

settings 1

single 5

site-specific 4

slots 2

slots. 1

so 1

space), 2

special 12

started 2

status 1

submission, 1

submit 4

submits 1

submitted 2

suppose 1

system 1

taken 2

task 1

tasks 3

tasks, 2

tasktracker. 1

tasktrackers 1

template 1

terms 1

than 5

that 4

the 81

them. 1

then 3

there 2

they 4

thing 1

third 1

this 16

thread 2

threads 2

time 2

time, 1

timestamp. 1

to 43

true, 2

true. 2

two 1

type="text/xsl" 5

updating 1

use 5

use. 1

used 15

user 40

user's 4

user1,user2 2

users 14

users, 1

users,wheel". 10

value 15

value. 2

values 1

various 1

vary 1

version="1.0"?> 7

via 3

view 1

which 17

who 3

will 8

with 5

worker 1

would 7

OK!!!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: