您的位置:首页 > 其它

how-to-configure-and-use-spark-history-server

2015-03-25 15:18 471 查看
how-to-configure-and-use-spark-history-server

参考

spark configuration

spark monitoring Viewing After the Fact

基础知识

how to configure spark ?

locations to configure spark

spark properties using val sparkConf=new SparkConf().set… in spark application code

Dynamically Loading Spark Properties using spark-submit options

./bin/spark-submit --name "My app" --master local[4] --conf spark.shuffle.spill=false --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar


Viewing Spark Properties using spark app WebUI
http://<driver>:4040
in the “Environment” tab

environment variable using conf/spark-env.sh

loging using conf/log4j.properites

Available Properties about history server

Property NameDefaultMeaning
spark.eventLog.compressfalseWhether to compress logged events, if spark.eventLog.enabled is true.
spark.eventLog.dirfile:///tmp/spark-eventsBase directory in which Spark events are logged, if spark.eventLog.enabled is true. Within this base directory, Spark creates a sub-directory for each application, and logs the events specific to the application in this directory. Users may want to set this to a unified location like an HDFS directory so history files can be read by the history server.
spark.eventLog.enabledfalseWhether to log Spark events, useful for reconstructing the Web UI after the application has finished.
* Environment Variables



enable spark eventlog

method 1:

bin/spark-submit --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=file:/data01/data_tmp/spark-events ...


method 2:

vi conf/spark-env.sh

SPARK_DAEMON_JAVA_OPTS=
SPARK_MASTER_OPTS=
SPARK_WORKER_OPTS=
SPARK_JAVA_OPTS="-Dspark.eventLog.enabled=true -Dspark.eventLog.dir=file:/data01/data_tmp/spark-events"


NOTE:

a spark.eventLog.dir=file:/data01/data_tmp/spark-events

eventlog 是由 driver 写日志,对于 local/standalone/yarn-client 是没有大问题,但对于 spark-cluster 需要特别注意,最好设置为 hdfs 路径

spark.eventLog.dir 指定目录不会自动生成,需要手工创建,有相应权限

b these Environment variable will be used in bin/spark-class)

test spark app after enable eventlog

test case:

object WordCount extends App {

val sparkConf = new SparkConf().setAppName("WordCount")
val sc = new SparkContext(sparkConf)

val lines = sc.textFile("file:/data01/data/datadir_github/spark/README.md")
val words = lines.flatMap(_.split("\\s+"))

val wordsCount = words.map(word=>(word, 1)).reduceByKey(_ + _)

wordsCount.foreach(println)

sc.stop()
}


测试问题1:IDEA中 running spark in local 模式没有生成 eventlog ,继续测试自带 的 examples.SparkPi 一样

处理方法1: 在 CLI 测试

bin/run-example SparkPi


结果报错:

Exception in thread “main” java.lang.IllegalArgumentException: Log directory file:/data01/data_tmp/spark-events does not exist.

处理方法2:

mkdir -p /data01/data_tmp/spark-events
bin/run-example SparkPi


结果:测试确认 /data01/data_tmp/spark-events 下生成了eventlog

继续在 IDEA中测试 running spark in local 发现依然没有生成 eventlog

原因分析:在 IDEA 中测试,虽然在依赖添加了 SPARK_CONF_DIR 路径,但 IDEA中执行并不像 在 CLI 使用
bin/spark-submit
提交app 读取解析
conf/spark-env.sh
中的配置文件

处理方法3:

在 IDEA 的 run configuration 设置 vm options
-Dspark.master="local[2]" -Dspark.eventLog.enabled=true -Dspark.eventLog.dir=file:/data01/data_tmp/spark-events


结果: local 模式正常

问题2:IDEA 测试 spark-on-yarn报错(暂没有解决)

在 IDEA 的 run configuration 设置 vm options
-Dspark.master="yarn-client" -Dspark.eventLog.enabled=true -Dspark.eventLog.dir=file:/data01/data_tmp/spark-events
报错

configure, start and use spark history server

configure

vi conf/spark-env.sh

SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=file:/data01/data_tmp/spark-events" #set when you use spark-history-server


NOTE

spark.history.fs.logDirectory , spark.eventLog.dir 可以不同,意味着能够移动 eventlog 文件,便于协助诊断

spark.history.ui.port

start

bin/start-history-server.sh


access spark history WebUI at
http://<server-url>:18080



spark history server WebUI applications



spark history server WebUI specific app
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: