您的位置:首页 > 其它

spark的windows开发环境搭建

2015-04-14 10:43 495 查看

Spark1.2.1开发环境搭建(适合windows环境)

更多0

1.环境准备

下载scala并安装,最好下载imsi版直接双击安装

2.IDEA的安装

官网jetbrains.com下载IntelliJ IDEA,有Community Editions 和& Ultimate Editions,前者免费,用户可以选择合适的版本使用。

根据安装指导安装IDEA后,需要安装scala插件,有两种途径可以安装scala插件:

启动IDEA -> Welcome to IntelliJ IDEA -> Configure -> Plugins -> Install JetBrains plugin… -> 找到scala后安装。
启动IDEA -> Welcome to IntelliJ IDEA -> Open Project -> File -> Settings -> plugins -> Install JetBrains plugin… -> 找到scala后安装。

如果你想使用那种酷酷的黑底界面,在File -> Settings -> Appearance -> Theme选择Darcula,同时需要修改默认字体,不然菜单中的中文字体不能正常显示。


2 建立scala应用程序

A:建立新项目

创建名为sparkTest 的project:启动IDEA -> Welcome to IntelliJ IDEA -> Create New Project -> Scala -> Non-SBT -> 创建一个名为sparkWordCountTets的project(注意这里选择自己安装的JDK和scala编译器) -> Finish。
添加Maven支持:右击项目->Add Framework support….->选择maven,为后面利用maven自动编译成Jar包做准备



添加maven支持

然后配置pom.xml文件,添加spark 和hadoop依赖

spark依赖与hadoop依赖

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.1</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<properties>
<scala.version>2.11.6</scala.version>
</properties>


注:这里使用的是spark1.2.1 和hadoop 2.6.0环境,请根据自己的情况配置。

如果你不选择使用maven则可以,手动添加依赖包

增加开发包:File -> Project Structure -> Libraries -> + -> java -> 选择自己所放置的下面包

spark-assembly-1.1.0-hadoop2.6.0.jar
scala-library.jar

项目目录结构如下:

B. 开发spark程序
我们直接使用spark里自带的SparkPi程序来做测试
import org.apache.spark._
import SparkContext._
object WordCount {

def main(args: Array[String]) {
if (args.length != 2 ){
println("usage is org.test.WordCount <master> <input> <output>")
return
}

val conf = new SparkConf()
conf.setMaster("spark://192.168.246.107:7077").setAppName("My WordCount")
val sc = new SparkContext(conf)
val textFile = sc.textFile(args(0))
val result = textFile.flatMap(line => line.split("\\s+"))
.map(word => (word, 1)).reduceByKey(_ + _)
result.saveAsTextFile(args(1))
sc.stop()
}
}

说明:

Setmaster:master的地址。(设置为local,表示本地运行,需要有spark,hadoop环境等等。如果设置远程的spark的master(spark://hadoop:7070)

SetappName:应用的名称。

SetsparkHome:spark的安装地址。

Setjars:jar包的位置。设置编译打包后的jar的位置

C:生成程序包

配置SparkTest打包

点击左下角的按钮出现如下界面

maven使用命令

双击package即打包

上图的 Building jar: D:\SparkWordCountTest\target\Test-1.0-SNAPSHOT.jar即是打包路径及生成的jar包

3.Spark应用程序部署

打包好的spark程序上传到集群,然后使用spark-submit方式部署运行。

详细参数如下

[hadoop@bigdata bin]$ ./spark-submit --help

Spark assembly has been built with Hive, including Datanucleus jars on classpath

Usage: spark-submit [options] <app jar | python file> [app options]

Options:

--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.

--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or

on one of the worker machines inside the cluster ("cluster")

(Default: client).

--class CLASS_NAME Your application's main class (for Java / Scala apps).

--name NAME A name of your application.

--jars JARS Comma-separated list of local jars to include on the driver

and executor classpaths.

--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place

on the PYTHONPATH for Python apps.

--files FILES Comma-separated list of files to be placed in the working

directory of each executor.

--conf PROP=VALUE Arbitrary Spark configuration property.

--properties-file FILE Path to a file from which to load extra properties. If not

specified, this will look for conf/spark-defaults.conf.

--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512M).

--driver-java-options Extra Java options to pass to the driver.

--driver-library-path Extra library path entries to pass to the driver.

--driver-class-path Extra class path entries to pass to the driver. Note that

jars added with --jars are automatically included in the

classpath.

--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).

--help, -h Show this help message and exit

--verbose, -v Print additional debug output

Spark standalone with cluster deploy mode only:

--driver-cores NUM Cores for driver (Default: 1).

--supervise If given, restarts the driver on failure.

Spark standalone and Mesos only:

--total-executor-cores NUM Total cores for all executors.

YARN-only:

--executor-cores NUM Number of cores per executor (Default: 1).

--queue QUEUE_NAME The YARN queue to submit to (Default: "default").

--num-executors NUM Number of executors to launch (Default: 2).

--archives ARCHIVES Comma separated list of archives to be extracted into the

working directory of each executor.

[hadoop@bigdata bin]$

此实例运行命令:

[hadoop@bigdata spark-1.2.1-bin-2.6.0]$ ./bin/spark-submit -class Test.WordCount /opt/app/spark-1.2.1-bin-2.6.0/test/Test-1.0-SNAPSHOT.jar /user/hadoop/test/input/test1.txt /user/hadoop/test/output00001

最后里面的两个参数是一个是输入文件,一个是输出目录

如--master spark://hadoop108:7077

--executor-memory 300m

可以在spark-env.sh里面配置

export JAVA_HOME=/opt/java/jdk1.7

export HADOOP_CONF_DIR=/opt/app/hadoop-2.6.0/etc/hadoop

export HIVE_CONF_DIR=/opt/app/hive-0.13.1/conf

export SCALA_HME=/opt/app/scala-2.10.5

export HADOOP_HOME=/opt/app/hadoop-2.6.0

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_MASTER_IP=192.168.246.107

export SPARK_MASTER_PORT=7077

export SPARK_WORKER_CORES=1

export SPARK_WORKER_MEMORY=1g

export SPARK_WORKER_INSTANCES=1

export SPARK_EXECUTOR_MEMORY=1g

export SPARK_JAVA_OPTS=-Dspark.executor.memory=1g

export SPARK_HOME=/opt/app/spark-1.2.1-bin-2.6.0

export SPARK_JAR=$SPARK_HOME/lib/spark-assembly-1.2.1-hadoop2.6.0.jar

export PATH=$SPARK_HOME/bin:$PATH

export SPARK_CLASSPATH=$SPARK_CLASSPATH:

结束语:

本文的测试环境是在window上使用intellij开发的spark程序,IDE里运行提交到单独的spark集群上的开发模式。有对window下的开发环境的要求的可以选择这种方式。(window上没有spark和hadoop环境。当然在安装有spark和hadoop环境下开发是再好不过)

只需要设置:
val conf
= new
SparkConf().setAppName("Word Count").setMaster("spark://hadoop:7070").setJars(List("out\\sparkTest_jar\\sparkTest.jar"))

设置master为你的spark集群master;
设置Jar的位置为你编译打包的位置
或者:
val spark
= new
SparkContext("spark://hadoop:7070",
"
Word Count
",
"F:\\soft\\spark\\spark-1.2.1-bin-hadoop2.6",
List("[code]out\\sparkTest_jar\\sparkTest.jar
"))[/code]
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: