spark--spark9.0安装【1】
2014-04-01 10:07
357 查看
spark:
Spark是下一代In Memory MR计算框架,性能上有数量级提升,同时支持Interactive Query、流计算、图计算等。支持java、scala适用范围:
1.高性能机器学习2.即时计算下载:
http://spark.apache.org/downloads.html安装:
spark纯粹模式:
这种模式就是一个单一的spark集群或者单spark测试机抑或开发机。1.在集群各个节点安装编译好的spark版本,也可以自己编译安装,自己编译点击此处。在conf/slaves中需要将需要使用的worker的hostname包含进去,和hadoop的slaves文件配置类型。
2.启动spark
./sbin/start-master.sh3.启动后master会首先输出
spark://HOST:PORT的url,也可以在mater的 http://localhost:8080上找到这个url的。
4.使用如下的命令启动worker并连接到master
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT5.在master上用http://localhost:8080这个地址对集群进行监控
6.又和hadoop类似,spark集群需要无密码访问的ssh
7.使用在
SPARK_HOME/bin的如下脚本对spark集群进行管理:
sbin/start-master.sh- 启动master实例.
sbin/start-slaves.sh- 启动在
conf/slaves文件里的worker实例.
sbin/start-all.sh-启动整个集群.
sbin/stop-master.sh- 停止通过
bin/start-master.sh脚本启动的实例.
sbin/stop-slaves.sh- 停止通过
bin/start-slaves.sh脚本启动的实例.
sbin/stop-all.sh- 停止整个集群.8.使用conf/spark-env.sh.template创建配置环境变量的conf/spark-env.sh文件。
Environment Variable | Meaning |
---|---|
SPARK_MASTER_IP | Bind the master to a specific IP address, for example a public one. |
SPARK_MASTER_PORT | Start the master on a different port (default: 7077). |
SPARK_MASTER_WEBUI_PORT | Port for the master web UI (default: 8080). |
SPARK_WORKER_PORT | Start the Spark worker on a specific port (default: random). |
SPARK_WORKER_DIR | Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work). |
SPARK_WORKER_CORES | Total number of cores to allow Spark applications to use on the machine (default: all available cores). |
SPARK_WORKER_MEMORY | Total amount of memory to allow Spark applications to use on the machine, e.g. 1000m, 2g(default: total memory minus 1 GB); note that each application's individual memory is configured using its spark.executor.memoryproperty. |
SPARK_WORKER_WEBUI_PORT | Port for the worker web UI (default: 8081). |
SPARK_WORKER_INSTANCES | Number of worker instances to run on each machine (default: 1). You can make this more than 1 if you have have very large machines and would like multiple Spark worker processes. If you do set this, make sure to also set SPARK_WORKER_CORESexplicitly to limit the cores per worker, or else each worker will try to use all the cores. |
SPARK_DAEMON_MEMORY | Memory to allocate to the Spark master and worker daemons themselves (default: 512m). |
SPARK_DAEMON_J***A_OPTS | JVM options for the Spark master and worker daemons themselves (default: none). |
相关文章推荐
- C#设计模式系列:代理模式(Proxy)
- [iOS]使用Audio Queue Services 播放和录制音频
- 创新,有时是不经意间开放的花朵——访2013 CCF青年科学家奖获得者朱军
- Java中如何获取spring中配置的properties属性文件内容
- PHP把网页保存为word文件的三种方法
- K-Means算法
- css兼容问题集锦
- Linux学习笔记-配置自己的Web Servier
- Struct和Class的区别
- 抽象工厂模式
- jconsole监控jboss/tomcat(转载)
- effective C++ 3 use const whenever possible
- AX ERP 真正的自动批处理
- wince下添加和删除驱动出现错误
- OpenGL(3)GLUT库回调函数API
- Android组件学习之TextView的直接子类和间接子类
- 避免Autoclose和Autoshrink选项
- 算法杂货铺——k均值聚类(K-means)
- opencv filter2D
- C#读取XML绑定页面元素