Apache Zeppelin & Spark 解析Json异常
2016-05-21 00:36
841 查看
下载的Apache Zeppelin和Apache Spark版本分别为:0.6.0-incubating-SNAPSHOT和1.5.2,在Zeppelin中使用SQLContext读取Json文件创建DataFrame的过程中出现了以下的异常:
val profilesJsonRdd =sqlc.jsonFile("hdfs://www.blog.com/tmp/json")
val profileDF=profilesJsonRdd.toDF()
profileDF.printSchema()
profileDF.show()
profileDF.registerTempTable("profiles")
com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
at [Source: {"id":"0","name":"hadoopRDD"}; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
at org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1603)
at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1603)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.RDD.<init>(RDD.scala:1603)
at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:101)
at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:122)
at org.apache.spark.<span class="wp_keywordlink_affiliate"><a href="http://www.iteblog.com/archives/tag/spark" title="" target="_blank" data-original-title="View all posts in Spark">Spark</a></span>Context$$anonfun$hadoopRDD$1.apply(<span class="wp_keywordlink_affiliate"><a href="http://www.iteblog.com/archives/tag/spark" title="" target="_blank" data-original-title="View all posts in Spark">Spark</a></span>Context.scala:996)
at org.apache.spark.SparkContext$$anonfun$hadoopRDD$1.apply(SparkContext.scala:992)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:992)
at org.apache.spark.sql.execution.datasources.json.JSONRelation.org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd(JSONRelation.scala:92)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6$$anonfun$apply$1.apply(JSONRelation.scala:106)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6$$anonfun$apply$1.apply(JSONRelation.scala:106)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:106)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:100)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$lzycompute(JSONRelation.scala:100)
at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema(JSONRelation.scala:99)
at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:561)
at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:560)
at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:219)
at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:1065)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
at $iwC$$iwC$$iwC.<init>(<console>:36)
at $iwC$$iwC.<init>(<console>:38)
at $iwC.<init>(<console>:40)
at <init>(<console>:42)
at .<init>(<console>:46)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:713)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:678)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:671)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:302)
at org.apache.zeppelin.scheduler.Job.run(Job.java:171)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)运行第一句代码就出现了异常。分析了一些,原来Apache Zeppelin 0.6.0-incubating-SNAPSHOT版本依赖的Jackson 相关文件版本为:2.5.x(参考里面的README.md文件),如下:
val profilesJsonRdd =sqlc.jsonFile("hdfs://www.blog.com/tmp/json")
val profileDF=profilesJsonRdd.toDF()
profileDF.printSchema()
profileDF.show()
profileDF.registerTempTable("profiles")
com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
at [Source: {"id":"0","name":"hadoopRDD"}; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
at org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1603)
at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1603)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.RDD.<init>(RDD.scala:1603)
at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:101)
at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:122)
at org.apache.spark.<span class="wp_keywordlink_affiliate"><a href="http://www.iteblog.com/archives/tag/spark" title="" target="_blank" data-original-title="View all posts in Spark">Spark</a></span>Context$$anonfun$hadoopRDD$1.apply(<span class="wp_keywordlink_affiliate"><a href="http://www.iteblog.com/archives/tag/spark" title="" target="_blank" data-original-title="View all posts in Spark">Spark</a></span>Context.scala:996)
at org.apache.spark.SparkContext$$anonfun$hadoopRDD$1.apply(SparkContext.scala:992)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:992)
at org.apache.spark.sql.execution.datasources.json.JSONRelation.org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd(JSONRelation.scala:92)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6$$anonfun$apply$1.apply(JSONRelation.scala:106)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6$$anonfun$apply$1.apply(JSONRelation.scala:106)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:106)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:100)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$lzycompute(JSONRelation.scala:100)
at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema(JSONRelation.scala:99)
at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:561)
at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:560)
at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:219)
at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:1065)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
at $iwC$$iwC$$iwC.<init>(<console>:36)
at $iwC$$iwC.<init>(<console>:38)
at $iwC.<init>(<console>:40)
at <init>(<console>:42)
at .<init>(<console>:46)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:713)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:678)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:671)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:302)
at org.apache.zeppelin.scheduler.Job.run(Job.java:171)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)运行第一句代码就出现了异常。分析了一些,原来Apache Zeppelin 0.6.0-incubating-SNAPSHOT版本依赖的Jackson 相关文件版本为:2.5.x(参考里面的README.md文件),如下:
(Apache 2.0) Jackson (com.fasterxml.jackson.core:jackson-core:2.5.3 - https://github.com/FasterXML/jackson-core) (Apache 2.0) Jackson (com.fasterxml.jackson.core:jackson-annotations:2.5.0 - https://github.com/FasterXML/jackson-core) (Apache 2.0) Jackson (com.fasterxml.jackson.core:jackson-databind:2.5.3 - https://github.com/FasterXML/jackson-core)[/code]而Apache Spark 1.5.2依赖的Jackson 相关文件版本为2.4.4(参考Spark的pom.xml文件):<fasterxml.jackson.version>2.4.4</fasterxml.jackson.version>而jsonFile函数在解析的时候用到了上面相关的类,导致了冲突。Apache Zeppelin在启动的时候会加载${ZEPPELIN_HOME}/zeppelin-server/target/lib下面的类库,而里面就有jackson相关的jar文件:-rw-r--r-- 1 blog blog 39815 Jan 20 16:35 jackson-annotations-2.5.0.jar -rw-r--r-- 1 blog blog 229998 Jan 20 16:35 jackson-core-2.5.3.jar -rw-r--r-- 1 blog blog 1143162 Jan 20 16:35 jackson-databind-2.5.3.jar问题就处在这了,所以我们可以将上面三个jar文件全部替换成2.4.4版本的:-rw-r--r-- 1 blog blog 38597 Nov 25 2014 jackson-annotations-2.4.4.jar -rw-r--r-- 1 blog blog 225302 Nov 25 2014 jackson-core-2.4.4.jar -rw-r--r-- 1 blog blog 1076926 Nov 25 2014 jackson-databind-2.4.4.jar然后重启zeppelin,再运行上面的语句,问题不再出现了。本来我想在pom.xml文件里面直接修改jackson的版本,然后再重新编译,但是搜遍了里面的pom.xml文件也没找到上面三个jar。
相关文章推荐
- 大数据实验室(大数据基础培训)——Zeppelin的安装、配置及基础使用
- spark standalone模式 zeppelin安装
- zeppelin入门使用
- Apache Zeppelin简介
- 使用Spark和Zeppelin探索movie-lens数据
- 使用Spark SQL 探索“全国失信人数据”
- Zeppelin安装文档
- zeppelin的安装与使用
- zeppelin导入第三方依赖
- Apache Zeppelin使用入门指南:添加外部依赖
- Apache Zeppelin使用入门指南:编程
- Apache Zeppelin使用入门指南:安装
- 在Yarn上运行Apache Zeppelin & Spark
- 大数据分析平台搭建教程:基于Apache Zeppelin Notebook和R的交互式数据科学
- Hadoop运维记录系列(二十一) 推荐
- Hadoop运维记录系列(二十)
- 如何使用zeppelin实现大数据可视化
- Zeppelin介绍与入门实践
- Zeppelin初探
- 技术探索20160808 - Hadoop相关技术栈