hadoop单机伪分布式环境搭建和mahout试用
2015-07-09 14:59
513 查看
单机版hadoop安装
(1)下载hadoop安装包,解压 http://hadoop.apache.org/releases.html
(2)配置环境变量
export PATH=$PATH:/home/iomssbd/user/hadoop-2.4.1/bin:/home/iomssbd/user/hadoop-2.4.1/sbin
export HADOOP_HOME=/home/iomssbd/user/hadoop-2.4.1
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
--------------------------------测试hadoop环境是否可用--------------------------------
cd $HADOOP_HOME
mkdir input
cp etc/hadoop/*.xml input
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
cat ./output/*
--------------------------------------------------------------------------------------------
(3)修改配置文件
1.修改hadoop安装路径etc/hadoop下的hadoop-env.sh
export JAVA_HOME=/home/iomssbd/user/java/jdk1.7.0_67
export HADOOP_LOG_DIR=/home/iomssbd/user/hadoop-2.4.1/logs
2.修改hadoop安装路径etc/hadoop下的core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/iomssbd/user/hadoop-2.4.1/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9001</value>
</property>
<property>
<name>hadoop.logfile.size</name>
<value>1000000</value>
<description>The max size of each log file</description>
</property>
<property>
<name>hadoop.logfile.count</name>
<value>5</value>
<description>The max number of log files</description>
</property>
3.修改hadoop安装路径etc/hadoop下的hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/iomssbd/user/hadoop-2.4.1/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/iomssbd/user/hadoop-2.4.1/tmp/dfs/data</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>localhost:50011</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>localhost:50076</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>localhost:50021</value>
</property>
(4)启动hadoop伪分布式集群
cd $HADOOP_HOME
sbin/start-dfs.sh
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
mahout安装和使用
(1)下载 http://archive.apache.org/dist/mahout/
(2)解压 tar -zcvf apache-mahout-distribution-0.10.1.tar.gz
(3)修改配置文件
export MAHOUT_HOME=/home/iomssbd/user/apache-mahout-distribution-0.10.1
export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf
export PATH=$MAHOUT_HOME/conf:$MAHOUT_HOME/bin:$PATH
(4)输入命令mahout测试是否成功,如果打印出过个mahout命令即成功
---------------------测试使用-----------------------------------------------
(5)下载测试数据 http://kdd.ics.uci.edu/databases/synthetic_control/链接中的synthetic_control.data
(6)创建dfs路径
hadoop fs -mkdir hdfs://localhost:9001/user
hadoop fs -mkdir hdfs://localhost:9001/user/iomssbd
(7)上传测试文件
hadoop fs -put synthetic_control.data hdfs://localhost:9001/user/iomssbd/testdata
(8)执行kmeans算法
mahout -core org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
(9)查看数据结果
hadoop fs -ls /user/iomssbd/output
(1)下载hadoop安装包,解压 http://hadoop.apache.org/releases.html
(2)配置环境变量
export PATH=$PATH:/home/iomssbd/user/hadoop-2.4.1/bin:/home/iomssbd/user/hadoop-2.4.1/sbin
export HADOOP_HOME=/home/iomssbd/user/hadoop-2.4.1
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
--------------------------------测试hadoop环境是否可用--------------------------------
cd $HADOOP_HOME
mkdir input
cp etc/hadoop/*.xml input
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
cat ./output/*
--------------------------------------------------------------------------------------------
(3)修改配置文件
1.修改hadoop安装路径etc/hadoop下的hadoop-env.sh
export JAVA_HOME=/home/iomssbd/user/java/jdk1.7.0_67
export HADOOP_LOG_DIR=/home/iomssbd/user/hadoop-2.4.1/logs
2.修改hadoop安装路径etc/hadoop下的core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/iomssbd/user/hadoop-2.4.1/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9001</value>
</property>
<property>
<name>hadoop.logfile.size</name>
<value>1000000</value>
<description>The max size of each log file</description>
</property>
<property>
<name>hadoop.logfile.count</name>
<value>5</value>
<description>The max number of log files</description>
</property>
3.修改hadoop安装路径etc/hadoop下的hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/iomssbd/user/hadoop-2.4.1/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/iomssbd/user/hadoop-2.4.1/tmp/dfs/data</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>localhost:50011</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>localhost:50076</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>localhost:50021</value>
</property>
(4)启动hadoop伪分布式集群
cd $HADOOP_HOME
sbin/start-dfs.sh
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
mahout安装和使用
(1)下载 http://archive.apache.org/dist/mahout/
(2)解压 tar -zcvf apache-mahout-distribution-0.10.1.tar.gz
(3)修改配置文件
export MAHOUT_HOME=/home/iomssbd/user/apache-mahout-distribution-0.10.1
export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf
export PATH=$MAHOUT_HOME/conf:$MAHOUT_HOME/bin:$PATH
(4)输入命令mahout测试是否成功,如果打印出过个mahout命令即成功
---------------------测试使用-----------------------------------------------
(5)下载测试数据 http://kdd.ics.uci.edu/databases/synthetic_control/链接中的synthetic_control.data
(6)创建dfs路径
hadoop fs -mkdir hdfs://localhost:9001/user
hadoop fs -mkdir hdfs://localhost:9001/user/iomssbd
(7)上传测试文件
hadoop fs -put synthetic_control.data hdfs://localhost:9001/user/iomssbd/testdata
(8)执行kmeans算法
mahout -core org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
(9)查看数据结果
hadoop fs -ls /user/iomssbd/output
相关文章推荐
- 详解HDFS Short Circuit Local Reads
- Hadoop_2.1.0 MapReduce序列图
- 使用Hadoop搭建现代电信企业架构
- 分布式版本管理git入门指南使用资料汇总及文章推荐
- 单机版搭建Hadoop环境图文教程详解
- 康诺云推出三款智能硬件产品,为健康管理业务搭建数据池
- C#分布式事务的超时处理实例分析
- MySQL中使用innobackupex、xtrabackup进行大数据的备份和还原教程
- Erlang分布式节点中的注册进程使用实例
- hadoop常见错误以及处理方法详解
- hadoop 单机安装配置教程
- hadoop的hdfs文件操作实现上传文件到hdfs
- hadoop实现grep示例分享
- C++实现的分布式游戏服务端引擎KBEngine详解
- php+ajax导入大数据时产生的问题处理
- C# 大数据导出word的假死报错的处理方法
- ASP.NET通过分布式Session提升性能
- Apache Hadoop版本详解
- linux下搭建hadoop环境步骤分享
- hadoop client与datanode的通信协议分析