您的位置:首页 > 运维架构 > Apache

Apache Zeppelin & Spark 解析Json异常

2016-05-21 00:36 841 查看
下载的Apache Zeppelin和Apache Spark版本分别为:0.6.0-incubating-SNAPSHOT和1.5.2,在Zeppelin中使用SQLContext读取Json文件创建DataFrame的过程中出现了以下的异常:

val profilesJsonRdd =sqlc.jsonFile("hdfs://www.blog.com/tmp/json")
val profileDF=profilesJsonRdd.toDF()
profileDF.printSchema()
profileDF.show()
profileDF.registerTempTable("profiles")

com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
at [Source: {"id":"0","name":"hadoopRDD"}; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
at org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1603)
at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1603)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.RDD.<init>(RDD.scala:1603)
at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:101)
at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:122)
at org.apache.spark.<span class="wp_keywordlink_affiliate"><a href="http://www.iteblog.com/archives/tag/spark" title="" target="_blank" data-original-title="View all posts in Spark">Spark</a></span>Context$$anonfun$hadoopRDD$1.apply(<span class="wp_keywordlink_affiliate"><a href="http://www.iteblog.com/archives/tag/spark" title="" target="_blank" data-original-title="View all posts in Spark">Spark</a></span>Context.scala:996)
at org.apache.spark.SparkContext$$anonfun$hadoopRDD$1.apply(SparkContext.scala:992)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:992)
at org.apache.spark.sql.execution.datasources.json.JSONRelation.org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd(JSONRelation.scala:92)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6$$anonfun$apply$1.apply(JSONRelation.scala:106)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6$$anonfun$apply$1.apply(JSONRelation.scala:106)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:106)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:100)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$lzycompute(JSONRelation.scala:100)
at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema(JSONRelation.scala:99)
at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:561)
at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:560)
at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:219)
at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:1065)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
at $iwC$$iwC$$iwC.<init>(<console>:36)
at $iwC$$iwC.<init>(<console>:38)
at $iwC.<init>(<console>:40)
at <init>(<console>:42)
at .<init>(<console>:46)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:713)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:678)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:671)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:302)
at org.apache.zeppelin.scheduler.Job.run(Job.java:171)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)运行第一句代码就出现了异常。分析了一些,原来Apache Zeppelin 0.6.0-incubating-SNAPSHOT版本依赖的Jackson 相关文件版本为:2.5.x(参考里面的README.md文件),如下:
(Apache 2.0) Jackson (com.fasterxml.jackson.core:jackson-core:2.5.3 - https://github.com/FasterXML/jackson-core) (Apache 2.0) Jackson (com.fasterxml.jackson.core:jackson-annotations:2.5.0 - https://github.com/FasterXML/jackson-core) (Apache 2.0) Jackson (com.fasterxml.jackson.core:jackson-databind:2.5.3 - https://github.com/FasterXML/jackson-core)[/code]而Apache Spark 1.5.2依赖的Jackson 相关文件版本为2.4.4(参考Spark的pom.xml文件):
<fasterxml.jackson.version>2.4.4</fasterxml.jackson.version>
而jsonFile函数在解析的时候用到了上面相关的类,导致了冲突。Apache Zeppelin在启动的时候会加载${ZEPPELIN_HOME}/zeppelin-server/target/lib下面的类库,而里面就有jackson相关的jar文件:
-rw-r--r-- 1 blog blog   39815 Jan 20 16:35 jackson-annotations-2.5.0.jar
-rw-r--r-- 1 blog blog  229998 Jan 20 16:35 jackson-core-2.5.3.jar
-rw-r--r-- 1 blog blog 1143162 Jan 20 16:35 jackson-databind-2.5.3.jar
问题就处在这了,所以我们可以将上面三个jar文件全部替换成2.4.4版本的:
-rw-r--r-- 1 blog blog   38597 Nov 25  2014 jackson-annotations-2.4.4.jar
-rw-r--r-- 1 blog blog  225302 Nov 25  2014 jackson-core-2.4.4.jar
-rw-r--r-- 1 blog blog 1076926 Nov 25  2014 jackson-databind-2.4.4.jar
然后重启zeppelin,再运行上面的语句,问题不再出现了。本来我想在pom.xml文件里面直接修改jackson的版本,然后再重新编译,但是搜遍了里面的pom.xml文件也没找到上面三个jar。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Zeppelin