您的位置：首页 > 其它

spark 错误id意义

2013-11-06 20:57 316 查看

可以推广到其他，由系统的错误id类报错找到问题的所在。

1 command exit code 137

1 为系统资源不够，从节点spark内存超出实际系统内存系统杀死excutor进程造成的错误

2可以用ulimit -a 看哪个系统限制可以调整 ulimit -v unlimited

2 Too many open files

设置最大文件数限制

ulimit -n 124000

/etc/security/limits.conf中soft和hard是限制修改

3 exit code 255

status of 255 error

错误类型：

java.io.IOException: Task process exit with nonzero status of 255.

at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

错误原因：

Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to higher value. By default, their values are 24 hours. These might be the reason for failure, though I'm not sure

参考这里：http://grepalex.com/2012/11/12/hadoop-logging/

http://218.245.3.161/2013/03/31/5965

413/11/07 08:06:17 INFO cluster.ClusterTaskSetManager: Loss was due to org.apache.spark.SparkException

org.apache.spark.SparkException: Error communicating with MapOutputTracker

at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:84)

at org.apache.spark.MapOutputTracker.getServerStatuses(MapOutputTracker.scala:170)

at org.apache.spark.BlockStoreShuffleFetcher.fetch(BlockStoreShuffleFetcher.scala:39)

at org.apache.spark.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:125)

at org.apache.spark.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:116)

at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)

at scala.collection.immutable.List.foreach(List.scala:76)

源码在

private[spark] class MapOutputTracker extends Logging {

private val timeout = Duration.create(System.getProperty("spark.akka.askTimeout", "10").toLong, "seconds")

// Set to the MapOutputTrackerActor living on the driver

var trackerActor: ActorRef = _

private var mapStatuses = new TimeStampedHashMap[Int, Array[MapStatus]]

// Incremented every time a fetch fails so that client nodes know to clear

// their cache of map output locations if this happens.

private var epoch: Long = 0

private val epochLock = new java.lang.Object

// Cache a serialized version of the output statuses for each shuffle to send them out faster

var cacheEpoch = epoch

private val cachedSerializedStatuses = new TimeStampedHashMap[Int, Array[Byte]]

val metadataCleaner = new MetadataCleaner("MapOutputTracker", this.cleanup)

// Send a message to the trackerActor and get its result within a default timeout, or

// throw a SparkException if this fails.

def askTracker(message: Any): Any = {

try {

val future = trackerActor.ask(message)(timeout)

return Await.result(future, timeout)

} catch {

case e: Exception =>

throw new SparkException("Error communicating with MapOutputTracker", e)

}

}

尝试调大 -Dspark.akka.askTimeout

4

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航