how-to-configure-and-use-spark-history-server
2015-03-25 15:18
471 查看
how-to-configure-and-use-spark-history-server
spark monitoring Viewing After the Fact
locations to configure spark
spark properties using val sparkConf=new SparkConf().set… in spark application code
Dynamically Loading Spark Properties using spark-submit options
Viewing Spark Properties using spark app WebUI
environment variable using conf/spark-env.sh
loging using conf/log4j.properites
Available Properties about history server
* Environment Variables
略
method 2:
vi conf/spark-env.sh
NOTE:
a spark.eventLog.dir=file:/data01/data_tmp/spark-events
eventlog 是由 driver 写日志,对于 local/standalone/yarn-client 是没有大问题,但对于 spark-cluster 需要特别注意,最好设置为 hdfs 路径
spark.eventLog.dir 指定目录不会自动生成,需要手工创建,有相应权限
b these Environment variable will be used in bin/spark-class)
测试问题1:IDEA中 running spark in local 模式没有生成 eventlog ,继续测试自带 的 examples.SparkPi 一样
处理方法1: 在 CLI 测试
结果报错:
Exception in thread “main” java.lang.IllegalArgumentException: Log directory file:/data01/data_tmp/spark-events does not exist.
处理方法2:
结果:测试确认 /data01/data_tmp/spark-events 下生成了eventlog
继续在 IDEA中测试 running spark in local 发现依然没有生成 eventlog
原因分析:在 IDEA 中测试,虽然在依赖添加了 SPARK_CONF_DIR 路径,但 IDEA中执行并不像 在 CLI 使用
处理方法3:
在 IDEA 的 run configuration 设置 vm options
结果: local 模式正常
问题2:IDEA 测试 spark-on-yarn报错(暂没有解决)
在 IDEA 的 run configuration 设置 vm options
vi conf/spark-env.sh
NOTE
spark.history.fs.logDirectory , spark.eventLog.dir 可以不同,意味着能够移动 eventlog 文件,便于协助诊断
spark.history.ui.port
start
access spark history WebUI at
spark history server WebUI applications
spark history server WebUI specific app
参考
spark configurationspark monitoring Viewing After the Fact
基础知识
how to configure spark ?locations to configure spark
spark properties using val sparkConf=new SparkConf().set… in spark application code
Dynamically Loading Spark Properties using spark-submit options
./bin/spark-submit --name "My app" --master local[4] --conf spark.shuffle.spill=false --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar
Viewing Spark Properties using spark app WebUI
http://<driver>:4040in the “Environment” tab
environment variable using conf/spark-env.sh
loging using conf/log4j.properites
Available Properties about history server
Property Name | Default | Meaning |
---|---|---|
spark.eventLog.compress | false | Whether to compress logged events, if spark.eventLog.enabled is true. |
spark.eventLog.dir | file:///tmp/spark-events | Base directory in which Spark events are logged, if spark.eventLog.enabled is true. Within this base directory, Spark creates a sub-directory for each application, and logs the events specific to the application in this directory. Users may want to set this to a unified location like an HDFS directory so history files can be read by the history server. |
spark.eventLog.enabled | false | Whether to log Spark events, useful for reconstructing the Web UI after the application has finished. |
略
enable spark eventlog
method 1:bin/spark-submit --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=file:/data01/data_tmp/spark-events ...
method 2:
vi conf/spark-env.sh
SPARK_DAEMON_JAVA_OPTS= SPARK_MASTER_OPTS= SPARK_WORKER_OPTS= SPARK_JAVA_OPTS="-Dspark.eventLog.enabled=true -Dspark.eventLog.dir=file:/data01/data_tmp/spark-events"
NOTE:
a spark.eventLog.dir=file:/data01/data_tmp/spark-events
eventlog 是由 driver 写日志,对于 local/standalone/yarn-client 是没有大问题,但对于 spark-cluster 需要特别注意,最好设置为 hdfs 路径
spark.eventLog.dir 指定目录不会自动生成,需要手工创建,有相应权限
b these Environment variable will be used in bin/spark-class)
test spark app after enable eventlog
test case:object WordCount extends App { val sparkConf = new SparkConf().setAppName("WordCount") val sc = new SparkContext(sparkConf) val lines = sc.textFile("file:/data01/data/datadir_github/spark/README.md") val words = lines.flatMap(_.split("\\s+")) val wordsCount = words.map(word=>(word, 1)).reduceByKey(_ + _) wordsCount.foreach(println) sc.stop() }
测试问题1:IDEA中 running spark in local 模式没有生成 eventlog ,继续测试自带 的 examples.SparkPi 一样
处理方法1: 在 CLI 测试
bin/run-example SparkPi
结果报错:
Exception in thread “main” java.lang.IllegalArgumentException: Log directory file:/data01/data_tmp/spark-events does not exist.
处理方法2:
mkdir -p /data01/data_tmp/spark-events bin/run-example SparkPi
结果:测试确认 /data01/data_tmp/spark-events 下生成了eventlog
继续在 IDEA中测试 running spark in local 发现依然没有生成 eventlog
原因分析:在 IDEA 中测试,虽然在依赖添加了 SPARK_CONF_DIR 路径,但 IDEA中执行并不像 在 CLI 使用
bin/spark-submit提交app 读取解析
conf/spark-env.sh中的配置文件
处理方法3:
在 IDEA 的 run configuration 设置 vm options
-Dspark.master="local[2]" -Dspark.eventLog.enabled=true -Dspark.eventLog.dir=file:/data01/data_tmp/spark-events
结果: local 模式正常
问题2:IDEA 测试 spark-on-yarn报错(暂没有解决)
在 IDEA 的 run configuration 设置 vm options
-Dspark.master="yarn-client" -Dspark.eventLog.enabled=true -Dspark.eventLog.dir=file:/data01/data_tmp/spark-events报错
configure, start and use spark history server
configurevi conf/spark-env.sh
SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=file:/data01/data_tmp/spark-events" #set when you use spark-history-server
NOTE
spark.history.fs.logDirectory , spark.eventLog.dir 可以不同,意味着能够移动 eventlog 文件,便于协助诊断
spark.history.ui.port
start
bin/start-history-server.sh
access spark history WebUI at http://<server-url>:18080
spark history server WebUI applications
spark history server WebUI specific app
相关文章推荐
- how to configure and use activemq in camel
- How to Configure, Install and Use libnefilter_queue on Linux
- How to configure and use CAN bus
- 【转】How to install and configure SharePoint Server 2010 SP1 on the existing SP 2010 Farm
- How-to: use spark to suport query across mysql tables and hbase tables
- How to Configure, Install and Use libnefilter_queue on Linux
- how-to-use-grahite-and-grafana-to-monitor-spark
- How to configure and use minicom in Ubuntu 12.04 User Manual
- How to use sendmail to configure SMTP server
- How to use ASMCA in silent mode to configure ASM for a stand-alone server [ID 1068788.1]
- How to configure an NTP client and server on CentOS/RedHat
- HOW TO CONFIGURE LINUX DNS SERVER STEP BY STEP GUIDE EXAMPLE AND IMPLEMENTATION
- How to Install and Configure Bind 9 (DNS Server) on Ubuntu / Debian System
- How to use Trusted Connection when SQL server and web Server are on two separate machines.
- how to use adb and gdbserver with VirtualBox - KVM (qemu)
- ADF Faces RC - How-to use the Client and Server Listener Component
- How to Configure iOS for iPhone and iPad to Use So
- How to configure and use Git with visual studio 2012 and TFS
- How to install and configure Jabber Server (Ejabberd) on Debian Lenny GNU / Linux
- How to Install and Configure a VNC Server on RedHat Enterprise Linux (RHEL) 6