Spark 机器学习逻辑回归demo
2017-09-12 10:42
295 查看
这里整理记录一下Spark ML学习的小示例,本人运行实例都是在spark-shell下,详细教程请参考官网地址:
http://spark.apache.org/docs/latest/ml-pipeline.html
Estimator, Transformer, 和 Param使用代码实例:
Pipeline 代码实例:
http://spark.apache.org/docs/latest/ml-pipeline.html
Estimator, Transformer, 和 Param使用代码实例:
import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.linalg.{Vector, Vectors} import org.apache.spark.sql.SparkSession import org.apache.spark.ml.param.ParamMap import org.apache.spark.sql.Row import spark.implicits._ //创建spark对象 val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.some.config.option", "some-value").getOrCreate() //准备训练集 val training = spark.createDataFrame(Seq( (1.0, Vectors.dense(0.0, 1.1, 0.1)), (0.0, Vectors.dense(2.0, 1.0, -1.0)), (0.0, Vectors.dense(2.0, 1.3, 1.0)), (1.0, Vectors.dense(0.0, 1.2, -0.5)) )).toDF("label", "features") //准备测试集 val test = spark.createDataFrame(Seq( (1.0, Vectors.dense(-1.0, 1.5, 1.3)), (0.0, Vectors.dense(3.0, 2.0, -0.1)), (1.0, Vectors.dense(0.0, 2.2, -1.5)) )).toDF("label", "features") //创建逻辑回归算法实例,并查看、设置相应参数 val lr = new LogisticRegression() println("LogisticRegression parameters:\n" + lr.explainParams() + "\n") lr.setMaxIter(10).setRegParam(0.01) //训练学习得到model1,查看model1的参数 val model1 = lr.fit(training) println("Model 1 was fit using parameters: " + model1.parent.extractParamMap) //用paraMap来设置参数集 val paramMap = ParamMap(lr.maxIter -> 20).put(lr.maxIter, 30) .put(lr.regParam -> 0.1, lr.threshold -> 0.55) //可以将两个paraMap结合起来 val paramMap2 = ParamMap(lr.probabilityCol -> "myProbability") val paramMapCombined = paramMap ++ paramMap2 //使用结合的paraMap训练学习得到model2 val model2 = lr.fit(training, paramMapCombined) println("Model 2 was fit using parameters: " + model2.parent.extractParamMap) //使用测试集测试model2 model2.transform(test).select("features", "label", "myProbability", "prediction").collect().foreach { case Row(features: Vector, label: Double, prob:Vector, prediction: Double) =>println(s"($features, $label) -> prob=$prob,prediction=$prediction")}
Pipeline 代码实例:
import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.feature.{HashingTF, Tokenizer} import org.apache.spark.ml.{Pipeline, PipelineModel} import org.apache.spark.ml.linalg.Vector import org.apache.spark.sql.SparkSession import org.apache.spark.sql.Row import spark.implicits._ //创建spark对象 val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.some.config.option", "some-value").getOrCreate() //准备训练集 val training = spark.createDataFrame(Seq( (0L, "a b c d e spark", 1.0), (1L, "b d", 0.0), (2L, "spark f g h", 1.0), (3L, "hadoop mapreduce", 0.0) )).toDF("id", "text", "label") //准备测试集 val test = spark.createDataFrame(Seq( (4L, "spark i j k"), (5L, "l m n"), (6L, "spark hadoop spark"), (7L, "apache hadoop") )).toDF("id", "text") //配置ML pipeline,由,tokenzier(分词器)、hashingTF和lr(逻辑回归)三个stage组成 val tokenizer = new Tokenizer().setInputCol("text").setOutputCol("words") val hashingTF = new HashingTF().setNumFeatures(1000).setInputCol(tokenizer.getOutputCol).setOutputCol("features") val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.001) val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, lr)) //训练Pipline得到model,即一个transformer(转换器) val model = pipeline.fit(training) //保存模型 model.write.overwrite().save("/tmp/spark-logistic-regression-model") //保存pipeline结构 pipeline.write.overwrite().save("/tmp/unfit-lr-model") //需要使用的时候加载模型 val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model") //使用测试集对模型进行测试 model.transform(test).select("id", "text", "probability", "prediction").collect().foreach { case Row(id: Long, text: String, prob: Vector,prediction: Double) =>println(s"($id, $text) --> prob=$prob,prediction=$prediction")}
相关文章推荐
- 机器学习:逻辑回归
- <机器学习练习>逻辑斯谛回归
- Coursera机器学习-第三周-逻辑回归Logistic Regression
- 七月算法机器学习笔记4--线性回归与逻辑回归
- 机器学习(一)线性回归、逻辑回归
- 斯坦福机器学习视频笔记 Week3 逻辑回归和正则化 Logistic Regression and Regularization
- 机器学习系列(3)_逻辑回归应用之Kaggle泰坦尼克之灾
- Spark中组件Mllib的学习30之逻辑回归LogisticRegressionWithLBFGS
- 斯坦福大学机器学习课程--逻辑回归算法
- Spark-mllib源码分析之逻辑回归(Logistic Regression)
- 机器学习之逻辑斯提回归(Logistic Regression)模型
- 机器学习:从编程的角度去理解逻辑回归
- Andrew Ng机器学习笔记ex5 正则化的逻辑回归、偏差和方差
- spark厦大----逻辑斯蒂回归分类器--spark.ml
- 机器学习:逻辑回归
- 机器学习-逻辑回归分类器
- 机器学习笔记之逻辑回归的正则化
- 机器学习系列(3)_逻辑回归应用之Kaggle泰坦尼克之灾
- 机器学习之——逻辑回归
- 机器学习:逻辑回归