您的位置:首页 > 运维架构 > Apache

SparkR读取CSV格式文件错误java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.u

2016-04-07 23:31 1436 查看
使用如下命令启动sparkR shell:

bin/sparkR --packages com.databricks:spark-csv_2.10:1.0.3

之后读入csv文件:

flights <- read.df(sqlContext, "/sparktest/nycflights13.csv", "com.databricks.spark.csv", header="true")

head(flights)

报错:

16/04/07 23:06:46 ERROR CsvRelation$: Exception while parsing line: 2013,1,1,914,-6,1244,4,"AA","N517AA",1589,"EWR","DFW",238,1372,9,14.

java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String

at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)

at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getUTF8String(rows.scala:248)

at org.apache.spark.sql.catalyst.expressions.BoundReference.eval(BoundAttribute.scala:49)

at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:295)

at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:84)

at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:60)

at com.databricks.spark.csv.CsvRelation$$anonfun$com$databricks$spark$csv$CsvRelation$$parseCSV$1.apply(CsvRelation.scala:150)

at com.databricks.spark.csv.CsvRelation$$anonfun$com$databricks$spark$csv$CsvRelation$$parseCSV$1.apply(CsvRelation.scala:130)

at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

at scala.collection.AbstractIterator.to(Iterator.scala:1157)

at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157

错误原因:

读取csv格式文件时加载的包错误:com.databricks:spark-csv_2.10:1.0.3

解决方法:

修改sparkR shell启动命令:

bin/sparkR --packages com.databricks:spark-csv_2.10:1.3.0



内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: