您的位置:首页 > 数据库 > SQL

ERROR InsertIntoHadoopFsRelationCommand: Aborting job. ...please set spark.sql.crossJoin.enabled

2018-01-18 10:51 465 查看
下面是报错信息:
18/01/18 10:28:00 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
org.apache.spark.sql.AnalysisException: Cartesian joins could be prohibitively expensive and are disabled by default. To explicitly enable them, please set spark.sql.crossJoin.enabled = true;
at org.apache.spark.sql.execution.joins.CartesianProductExec.doPrepare(CartesianProductExec.scala:96)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:199)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:134)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.exe
4000
cute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:573)
at wangsheng.sibat.highway.cal$.saveFile$1(cal.scala:50)
at wangsheng.sibat.highway.cal$.main(cal.scala:47)
at wangsheng.sibat.highway.cal.main(cal.scala)查看我的代码: val IDroadFlow = data2.filter($"InRoadNo" === "3|4|5").groupBy("carID","InRoadNo").count().toDF("carID","InRoadNo","count").join(GPSdata,$"InRoadNo" === "Road")
.toDF("carID","InRoadNo","count","InRoad","Node","InRoadName","NodeName")错误提示中有提到
please set spark.sql.crossJoin.enabled = true
是join的问题,所以我主要查看join问题是在哪里。没有在join操作中的列名前加$符号,也没有指定连接类型,都加上
val IDroadFlow = data2.filter($"InRoadNo" === "3|4|5").groupBy("carID","InRoadNo").count().toDF("carID","InRoadNo","count").join(GPSdata,$"InRoadNo" === $"Road","inner")
.toDF("carID","InRoadNo","count","InRoad","Node","InRoadName","NodeName")运行OK
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐