您的位置:首页 > 其它

Scala IDE 搭建Spark 2开发环境和运行例子

2016-10-13 16:37 501 查看
在widow上用Scala IDE  创建Spark 2.0 的开发环境

1、创建 maven Project



2、 修改pom.xml

花了很多时间在这里修改pom.xml,  可以参考如maven repository和Github的pom.xml

最后我的pom.xml如下:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.big</groupId>

  <artifactId>mydata</artifactId>

  <version>0.0.1-SNAPSHOT</version>

   <dependencies>

    <dependency>

    <groupId>org.scala-lang</groupId>

    <artifactId>scala-library</artifactId>

    <version>2.10.6</version>

  </dependency>

  <dependency>

    <groupId>org.apache.spark</groupId>

    <artifactId>spark-core_2.10</artifactId>

    <version>2.0.0</version>

   </dependency>

  </dependencies>

<pluginRepositories>

<pluginRepository>

    <id>scala-tools.org</id>

    <name>Scala-tools Maven2 Repository</name>

    <url>http://scala-tools.org/repo-releases</url>

  </pluginRepository>

</pluginRepositories>

<build>

    <sourceDirectory>src/main/scala</sourceDirectory>

    <testSourceDirectory>src/test/scala</testSourceDirectory>

    <plugins>

      <plugin>

        <groupId>org.scala-tools</groupId>

        <artifactId>maven-scala-plugin</artifactId>

        <executions>

          <execution>

            <goals>

              <goal>compile</goal>

              <goal>testCompile</goal>

            </goals>

          </execution>

        </executions>

      </plugin>

     

    </plugins>

    <defaultGoal>compile</defaultGoal>

  </build> 

  

</project>

3. 修改配置

选中项目右键 configure -> add  scala nature..

               右键property  在对话框调整Scala 编译器等信息

4. 写spark例子

 在/src/main/scala  下,创建SimpleApp .scala  

import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

import org.apache.spark.SparkConf

object SimpleApp {

  def main(args: Array[String]) {

    val logFile = "C:/spark/README.md" // Should be some file on your system

    val conf = new SparkConf().setAppName("Simple Application").setMaster("local")

    val sc = new SparkContext(conf)

    val logData = sc.textFile(logFile, 2).cache()

    val numAs = logData.filter(line => line.contains("a")).count()

    val numBs = logData.filter(line => line.contains("b")).count()

    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))

  }

}

5.  运行例子

选中pom.xml 右键 Run AS ->Maven  build 

如果build成功,可以选中 SimpleApp .scala  右键运行Scala application

就可以在控制台看到Log, 说明成功。

............................

16/10/13 16:32:20 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 43 ms on localhost (2/2)

16/10/13 16:32:20 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 

16/10/13 16:32:20 INFO DAGScheduler: Job 0 finished: count at SimpleApp .scala:15, took 0.414282 s

16/10/13 16:32:20 INFO SparkContext: Starting job: count at SimpleApp .scala:16

16/10/13 16:32:20 INFO DAGScheduler: Got job 1 (count at SimpleApp .scala:16) with 2 output partitions

16/10/13 16:32:20 INFO DAGScheduler: Final stage: ResultStage 1 (count at SimpleApp .scala:16)

16/10/13 16:32:20 INFO DAGScheduler: Parents of final stage: List()

16/10/13 16:32:20 INFO DAGScheduler: Missing parents: List()

16/10/13 16:32:20 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at filter at SimpleApp .scala:16), which has no missing parents

16/10/13 16:32:20 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KB, free 897.5 MB)

16/10/13 16:32:20 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1965.0 B, free 897.5 MB)

16/10/13 16:32:20 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.56.182.2:52894 (size: 1965.0 B, free: 897.6 MB)

16/10/13 16:32:20 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1012

16/10/13 16:32:20 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at filter at SimpleApp .scala:16)

16/10/13 16:32:20 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks

16/10/13 16:32:20 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, partition 0, PROCESS_LOCAL, 5278 bytes)

16/10/13 16:32:20 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)

16/10/13 16:32:20 INFO BlockManager: Found block rdd_1_0 locally

16/10/13 16:32:20 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 954 bytes result sent to driver

16/10/13 16:32:20 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, partition 1, PROCESS_LOCAL, 5278 bytes)

16/10/13 16:32:20 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)

16/10/13 16:32:20 INFO BlockManager: Found block rdd_1_1 locally

16/10/13 16:32:20 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 954 bytes result sent to driver

16/10/13 16:32:20 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 32 ms on localhost (1/2)

16/10/13 16:32:20 INFO DAGScheduler: ResultStage 1 (count at SimpleApp .scala:16) finished in 0.038 s

16/10/13 16:32:20 INFO DAGScheduler: Job 1 finished: count at SimpleApp .scala:16, took 0.069170 s

16/10/13 16:32:20 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 21 ms on localhost (2/2)
16/10/13 16:32:20 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 

Lines with a: 61, Lines with b: 27
16/10/13 16:32:20 INFO SparkContext: Invoking stop() from shutdown hook

16/10/13 16:32:20 INFO SparkUI: Stopped Spark web UI at http://10.56.182.2:4040
16/10/13 16:32:20 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

16/10/13 16:32:20 INFO MemoryStore: MemoryStore cleared

16/10/13 16:32:20 INFO BlockManager: BlockManager stopped

16/10/13 16:32:20 INFO BlockManagerMaster: BlockManagerMaster stopped

16/10/13 16:32:20 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

16/10/13 16:32:20 INFO SparkContext: Successfully stopped SparkContext

16/10/13 16:32:20 INFO ShutdownHookManager: Shutdown hook called
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: