spark基础(三)------------------------使用maven构建一个基于scala的spark应用程序。
2015-06-14 20:11
429 查看
这一章讲解一下如何使用maven构建我们的spark应用程序。
首先,安装maven,在centos7上使用yum install maven直接安装。
然后按照maven的约定,建立如下目录:
spark-hello/
spark-hello/src
spark-hello/src/main
spark-hello/src/main/scala
spark-hello/src/main/scala/com
spark-hello/src/main/scala/com/spark
spark-hello/src/main/scala/com/spark/demo1
spark-hello/src/main/scala/com/spark/demo1/App.scala
spark-hello/pom.xml
spark-hello/target
编辑pom.xml:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.spark.demo1</groupId>
<artifactId>spark-hello</artifactId>
<version>1.0-SNAPSHOT</version>
<name>${project.artifactId}</name>
<description>My wonderfull scala app</description>
<inceptionYear>2010</inceptionYear>
<licenses>
<license>
<name>My License</name>
<url>http://....</url>
<distribution>repo</distribution>
</license>
</licenses>
<properties>
<maven.compiler.source>1.5</maven.compiler.source>
<maven.compiler.target>1.5</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.11.6</scala.version>
</properties>
<!--
<repositories>
<repository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</pluginRepository>
</pluginRepositories>
-->
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.3.1</version>
</dependency>
<!-- Test -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scala-tools.testing</groupId>
<artifactId>specs_2.9.3</artifactId>
<version>1.6.9</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest</artifactId>
<version>1.2</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.10</version>
<configuration>
<useFile>false</useFile>
<disableXmlReport>true</disableXmlReport>
<!-- If you have classpath issue like NoDefClassError,... -->
<!-- useManifestOnlyJar>false</useManifestOnlyJar -->
<includes>
<include>**/*Test.*</include>
<include>**/*Suite.*</include>
</includes>
</configuration>
</plugin>
</plugins>
</build>
</project>
红色的部分为重要部分,分别是我们的应用打包后的maven坐标,所依懒的scala,spark版本。
接着,编写我们的App.scala:
package com.spark.demo1
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
/**
* @author ${user.name}
*/
object App {
def main(args : Array[String]) {
val logFile = "/usr/local/spark/spark-1.3.1-bin-hadoop2.6/README.md" /**为你的spark安装目录**/
val conf = new SparkConf().setAppName("App")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile,2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s,Lines with b: %s".format(numAs,numBs))
}
}
再然后,执行mvn clean package打包我们的程序。mvn会下载依懒的scala,spark等包,然后编译打包。
[root@localhost spark-hello]# mvn clean package
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building spark-hello 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO]
[INFO] --- maven-scala-plugin:2.15.2:testCompile (default) @ spark-hello ---
[WARNING] No source files found.
[INFO]
[INFO] --- maven-surefire-plugin:2.10:test (default-test) @ spark-hello ---
[INFO] Surefire report directory: /usr/local/maven-study/spark-hello/target/surefire-reports
.........................................................
Results :
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] --- maven-jar-plugin:2.3.2:jar (default-jar) @ spark-hello ---
[INFO] Building jar: /usr/local/maven-study/spark-hello/target/spark-hello-1.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14.238s
[INFO] Finished at: Sun Jun 14 20:08:32 CST 2015
[INFO] Final Memory: 13M/32M
[INFO] ------------------------------------------------------------------------
[root@localhost spark-hello]#
最后,用spark-submit提交运行
[root@localhost spark-hello]# spark-submit --class "com.spark.demo1.App" --master local[2] target/spark-hello-1.0-SNAPSHOT.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/06/14 20:09:55 INFO SparkContext: Running Spark version 1.3.1
....................................................
15/06/14 20:09:59 INFO DAGScheduler: Stage 1 (count at App.scala:18) finished in 0.033 s
15/06/14 20:09:59 INFO DAGScheduler: Job 1 finished: count at App.scala:18, took 0.068178 s
Lines with a: 60,Lines with b: 29
[root@localhost spark-hello]#
首先,安装maven,在centos7上使用yum install maven直接安装。
然后按照maven的约定,建立如下目录:
spark-hello/
spark-hello/src
spark-hello/src/main
spark-hello/src/main/scala
spark-hello/src/main/scala/com
spark-hello/src/main/scala/com/spark
spark-hello/src/main/scala/com/spark/demo1
spark-hello/src/main/scala/com/spark/demo1/App.scala
spark-hello/pom.xml
spark-hello/target
编辑pom.xml:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.spark.demo1</groupId>
<artifactId>spark-hello</artifactId>
<version>1.0-SNAPSHOT</version>
<name>${project.artifactId}</name>
<description>My wonderfull scala app</description>
<inceptionYear>2010</inceptionYear>
<licenses>
<license>
<name>My License</name>
<url>http://....</url>
<distribution>repo</distribution>
</license>
</licenses>
<properties>
<maven.compiler.source>1.5</maven.compiler.source>
<maven.compiler.target>1.5</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.11.6</scala.version>
</properties>
<!--
<repositories>
<repository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</pluginRepository>
</pluginRepositories>
-->
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.3.1</version>
</dependency>
<!-- Test -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scala-tools.testing</groupId>
<artifactId>specs_2.9.3</artifactId>
<version>1.6.9</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest</artifactId>
<version>1.2</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.10</version>
<configuration>
<useFile>false</useFile>
<disableXmlReport>true</disableXmlReport>
<!-- If you have classpath issue like NoDefClassError,... -->
<!-- useManifestOnlyJar>false</useManifestOnlyJar -->
<includes>
<include>**/*Test.*</include>
<include>**/*Suite.*</include>
</includes>
</configuration>
</plugin>
</plugins>
</build>
</project>
红色的部分为重要部分,分别是我们的应用打包后的maven坐标,所依懒的scala,spark版本。
接着,编写我们的App.scala:
package com.spark.demo1
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
/**
* @author ${user.name}
*/
object App {
def main(args : Array[String]) {
val logFile = "/usr/local/spark/spark-1.3.1-bin-hadoop2.6/README.md" /**为你的spark安装目录**/
val conf = new SparkConf().setAppName("App")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile,2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s,Lines with b: %s".format(numAs,numBs))
}
}
再然后,执行mvn clean package打包我们的程序。mvn会下载依懒的scala,spark等包,然后编译打包。
[root@localhost spark-hello]# mvn clean package
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building spark-hello 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO]
[INFO] --- maven-scala-plugin:2.15.2:testCompile (default) @ spark-hello ---
[WARNING] No source files found.
[INFO]
[INFO] --- maven-surefire-plugin:2.10:test (default-test) @ spark-hello ---
[INFO] Surefire report directory: /usr/local/maven-study/spark-hello/target/surefire-reports
.........................................................
Results :
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] --- maven-jar-plugin:2.3.2:jar (default-jar) @ spark-hello ---
[INFO] Building jar: /usr/local/maven-study/spark-hello/target/spark-hello-1.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14.238s
[INFO] Finished at: Sun Jun 14 20:08:32 CST 2015
[INFO] Final Memory: 13M/32M
[INFO] ------------------------------------------------------------------------
[root@localhost spark-hello]#
最后,用spark-submit提交运行
[root@localhost spark-hello]# spark-submit --class "com.spark.demo1.App" --master local[2] target/spark-hello-1.0-SNAPSHOT.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/06/14 20:09:55 INFO SparkContext: Running Spark version 1.3.1
....................................................
15/06/14 20:09:59 INFO DAGScheduler: Stage 1 (count at App.scala:18) finished in 0.033 s
15/06/14 20:09:59 INFO DAGScheduler: Job 1 finished: count at App.scala:18, took 0.068178 s
Lines with a: 60,Lines with b: 29
[root@localhost spark-hello]#
相关文章推荐
- 最新php环境搭建
- 保存现场数据和状态:onSaveInstanceState\onRestoreInstanceState\onCreate()
- Blobstore Java API overview
- 服务器 连接不上网
- 黑马程序员——网络编程2:网络通讯组件介绍及演示-上
- 《 转》Linux 网卡驱动程序对 ethtool 的支持和实现
- 需求文档中容易出的错误
- Android学习4复选框checkbox组件
- ListView Item 无法获得焦点问题
- Flying to the Mars(字典树)
- MFC基础类库总结
- zerglurker的C语言教程009——运算符详解(一)
- 从VC维和结构风险最小原理深入理解SVM
- 初识云计算
- mybatis获取插入的语句主键(自增主键)
- MsChart 组件的实现
- hibernate的注解属性mappedBy详解
- 关于css中两层div的水平垂直居中问题
- Java并发编程-06-Synchronize关键字同步机制
- 角点检测方法