job触发流程原理剖析与源码分析
2017-10-14 11:35
453 查看
以wordcount流程解析
val lines = sc.textFile()
val words = lines.flatMap(line => line.split(” “)) val pairs =
words.map(word => (word, 1))
val counts = pairs.reduceByKey(_ + _)
counts.foreach(count => println(count._1 + “: ” + count._2))
val lines = sc.textFile()
def textFile( path: String, minPartitions: Int = defaultMinPartitions): RDD[String] = withScope { assertNotStopped() //hadoopFile()方法的调用,拿到Hadoop的配置文件,创建HadoopRDD,广播变量 hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], //执行map算子操作,剔除key,只保留Value,获得一个MapPartionsRDD。 //MapPartionsRDD里面就是一行一行的文本数据 minPartitions).map(pair => pair._2.toString).setName(path) }
val words = lines.flatMap(line => line.split(” “)) val pairs =
words.map(word => (word, 1))
// 其实RDD里是没有reduceByKey的,因此对RDD调用reduceByKey()方法的时候,会触发scala的隐式转换;此时就会在作用域内,寻找隐式转换,会在RDD中找到rddToPairRDDFunctions()隐式转换,然后将RDD转换为PairRDDFunctions。 // 接着会调用PairRDDFunctions中的reduceByKey()方法
val counts = pairs.reduceByKey(_ + _)
counts.foreach(count => println(count._1 + “: ” + count._2))
def runJob[T, U: ClassTag]( rdd: RDD[T], func: (TaskContext, Iterator[T]) => U, partitions: Seq[Int], resultHandler: (Int, U) => Unit): Unit = { if (stopped.get()) { throw new IllegalStateException("SparkContext has been shutdown") } val callSite = getCallSite val cleanedFunc = clean(func) logInfo("Starting job: " + callSite.shortForm) if (conf.getBoolean("spark.logLineage", false)) { logInfo("RDD's recursive dependencies:\n" + rdd.toDebugString) } //调用SparkContext之前初始化创建的DAGScheduler的Runjob的方法。 dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, resultHandler, localProperties.get) progressBar.foreach(_.finishAll()) rdd.doCheckpoint() }
相关文章推荐
- Spark2.2 job触发流程原理剖析与源码分析
- 6.job触发流程原理剖析与源码分析
- Spark源码分析之Job触发原理
- Spark的job触发流程原理与stage划分算法分析
- Hbase 源码分析四 - Get 流程及rpc原理
- CacheManager原理剖析与源码分析
- spring Quartz 源码分析--触发器类CronTriggerBean源码剖析
- CacheManager原理剖析与源码分析
- checkpoint原理剖析与源码分析
- Spring MVC原理(二)请求处理流程源码分析
- Spark内核源码深度剖析:Master主备切换机制原理剖析与源码分析
- BlockManager原理剖析与源码分析
- Glide源码分析(六)——从DecodeJob相关实现看图片加载流程
- Worker原理剖析与源码分析
- Executor原理剖析与源码分析
- Spark源码之路(二):Master原理剖析与源码分析
- Hadoop源码分析24 JobTracker启动和心跳处理流程
- boost.asio源码剖析(三) ---- 流程分析
- spring Quartz 源码分析--触发器类SimpleTriggerBean源码剖析