ERROR InsertIntoHadoopFsRelationCommand: Aborting job. ...please set spark.sql.crossJoin.enabled
2018-01-18 10:51
465 查看
下面是报错信息:
18/01/18 10:28:00 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
org.apache.spark.sql.AnalysisException: Cartesian joins could be prohibitively expensive and are disabled by default. To explicitly enable them, please set spark.sql.crossJoin.enabled = true;
at org.apache.spark.sql.execution.joins.CartesianProductExec.doPrepare(CartesianProductExec.scala:96)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:199)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:134)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.exe
4000
cute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:573)
at wangsheng.sibat.highway.cal$.saveFile$1(cal.scala:50)
at wangsheng.sibat.highway.cal$.main(cal.scala:47)
at wangsheng.sibat.highway.cal.main(cal.scala)查看我的代码: val IDroadFlow = data2.filter($"InRoadNo" === "3|4|5").groupBy("carID","InRoadNo").count().toDF("carID","InRoadNo","count").join(GPSdata,$"InRoadNo" === "Road")
.toDF("carID","InRoadNo","count","InRoad","Node","InRoadName","NodeName")错误提示中有提到
val IDroadFlow = data2.filter($"InRoadNo" === "3|4|5").groupBy("carID","InRoadNo").count().toDF("carID","InRoadNo","count").join(GPSdata,$"InRoadNo" === $"Road","inner")
.toDF("carID","InRoadNo","count","InRoad","Node","InRoadName","NodeName")运行OK
18/01/18 10:28:00 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
org.apache.spark.sql.AnalysisException: Cartesian joins could be prohibitively expensive and are disabled by default. To explicitly enable them, please set spark.sql.crossJoin.enabled = true;
at org.apache.spark.sql.execution.joins.CartesianProductExec.doPrepare(CartesianProductExec.scala:96)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:199)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:134)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.exe
4000
cute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:573)
at wangsheng.sibat.highway.cal$.saveFile$1(cal.scala:50)
at wangsheng.sibat.highway.cal$.main(cal.scala:47)
at wangsheng.sibat.highway.cal.main(cal.scala)查看我的代码: val IDroadFlow = data2.filter($"InRoadNo" === "3|4|5").groupBy("carID","InRoadNo").count().toDF("carID","InRoadNo","count").join(GPSdata,$"InRoadNo" === "Road")
.toDF("carID","InRoadNo","count","InRoad","Node","InRoadName","NodeName")错误提示中有提到
please set spark.sql.crossJoin.enabled = true是join的问题,所以我主要查看join问题是在哪里。没有在join操作中的列名前加$符号,也没有指定连接类型,都加上
val IDroadFlow = data2.filter($"InRoadNo" === "3|4|5").groupBy("carID","InRoadNo").count().toDF("carID","InRoadNo","count").join(GPSdata,$"InRoadNo" === $"Road","inner")
.toDF("carID","InRoadNo","count","InRoad","Node","InRoadName","NodeName")运行OK
相关文章推荐
- 3.数据库操作相关术语,Oracle认证,insert into,批量插入,update tablename set,delete和truncate的区别,sql文件导入
- Insert data into a table using table variables and cross join
- 插入数据sql使用“insert into set”形式的理解
- Hadoop error: Bad connection to FS. command aborted.
- 插入数据sql使用“insert into set”形式的理解
- 3.数据库操作相关术语,Oracle认证,insert into,批量插入,update tablename set,delete和truncate的差别,sql文件导入
- SQL Error: setEnabled failed for server Protocol 'tcp'
- Spark SQL 之 Join 实现
- select into from与insert into select区别详解,sql语句复制表
- 第69课:Spark SQL通过Hive数据源JOIN实战 每天晚上20:00YY频道现场授课频道68917580
- ubuntu Hadoop启动报Error: JAVA_HOME is not set and could not be found解决办法
- hadoop报错:WARN mapred.JobClient: Error reading task outputNo route to host
- Spark SQL中实现Hive MapJoin
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter4 LeftOuterJoin
- linq to sql 连接分组 使用join和into
- mmcblk0: error -110 sending status command, aborting
- 启动spark的pyspark命令窗口时报错-pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.
- SQL Server里面如何导出包含(insert into)数据的SQL脚本 (转)
- 添加到批处理SQL命令声明20.15.5.Add batch SQL command into Statement
- SQL Insert into 语句插入后返回新插入的自动增长ID