您的位置:首页 > 大数据

大数据入门第五天——离线计算之hadoop(上)概述与集群安装

2018-01-26 15:39 495 查看

一、概述

  根据之前的凡技术必登其官网的原则,我们当然先得找到它的官网:http://hadoop.apache.org/

  1.什么是hadoop

    先看官网介绍:


  The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

  The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures


    中文解释:


HADOOP是apache旗下的一套开源软件平台——使用Java开发

HADOOP提供的功能:利用服务器集群,根据用户的自定义业务逻辑,对海量数据进行分布式处理

HADOOP的核心组件有
HDFS(分布式文件系统)

YARN(运算资源调度系统)

MAPREDUCE(分布式运算编程框架)

广义上来说,HADOOP通常是指一个更广泛的概念——HADOOP生态圈


  W3C相关概述https://www.w3cschool.cn/hadoop/hadoop-3rpe22xm.html

  推荐阅读:《hadoop权威指南》

    hadoop的定位:


现阶段,云计算的两大底层支撑技术为“虚拟化”和“大数据技术

   2.而HADOOP则是云计算的PaaS层的解决方案之一,并不等同于PaaS,更不等同于云计算本身


  关于上面提到的PaaS的的概念,参考网友十分钟看懂云计算概念!

  这里重点应该注意云计算的本质——社会分工!

   1.5.HADOOP版本变迁史

    2.0版本新增yarn模块!

    混乱的hadoop版本变迁史:

    https://www.cnblogs.com/meet/p/5435979.html

    图解:http://blog.csdn.net/matthewei6/article/details/50499343

    商业发行版本CDH:

    http://blog.csdn.net/duyuanhai/article/details/54908298

  2.hadoop核心组件



Hadoop Common: The common utilities that support the other Hadoop modules.

Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.

Hadoop YARN: A framework for job scheduling and cluster resource management.

Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.


  翻译过来就是:分布式文件系统、分布式资源管理、分布式运算程序开发框架

  当然,这只是狭义的hadoop,而广义的hadoop则是hadoop生态圈:

  

5/关于副本数量的问题
副本数由客户端的参数dfs.replication决定(优先级: conf.set >  自定义配置文件 > jar包中的hdfs-default.xml)


副本数量
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐