SparkR读取CSV格式文件错误java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.u
2016-04-07 23:31
1436 查看
使用如下命令启动sparkR shell:
bin/sparkR --packages com.databricks:spark-csv_2.10:1.0.3
之后读入csv文件:
flights <- read.df(sqlContext, "/sparktest/nycflights13.csv", "com.databricks.spark.csv", header="true")
head(flights)
报错:
16/04/07 23:06:46 ERROR CsvRelation$: Exception while parsing line: 2013,1,1,914,-6,1244,4,"AA","N517AA",1589,"EWR","DFW",238,1372,9,14.
java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getUTF8String(rows.scala:248)
at org.apache.spark.sql.catalyst.expressions.BoundReference.eval(BoundAttribute.scala:49)
at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:295)
at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:84)
at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:60)
at com.databricks.spark.csv.CsvRelation$$anonfun$com$databricks$spark$csv$CsvRelation$$parseCSV$1.apply(CsvRelation.scala:150)
at com.databricks.spark.csv.CsvRelation$$anonfun$com$databricks$spark$csv$CsvRelation$$parseCSV$1.apply(CsvRelation.scala:130)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157
错误原因:
读取csv格式文件时加载的包错误:com.databricks:spark-csv_2.10:1.0.3
解决方法:
修改sparkR shell启动命令:
bin/sparkR --packages com.databricks:spark-csv_2.10:1.3.0
bin/sparkR --packages com.databricks:spark-csv_2.10:1.0.3
之后读入csv文件:
flights <- read.df(sqlContext, "/sparktest/nycflights13.csv", "com.databricks.spark.csv", header="true")
head(flights)
报错:
16/04/07 23:06:46 ERROR CsvRelation$: Exception while parsing line: 2013,1,1,914,-6,1244,4,"AA","N517AA",1589,"EWR","DFW",238,1372,9,14.
java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getUTF8String(rows.scala:248)
at org.apache.spark.sql.catalyst.expressions.BoundReference.eval(BoundAttribute.scala:49)
at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:295)
at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:84)
at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:60)
at com.databricks.spark.csv.CsvRelation$$anonfun$com$databricks$spark$csv$CsvRelation$$parseCSV$1.apply(CsvRelation.scala:150)
at com.databricks.spark.csv.CsvRelation$$anonfun$com$databricks$spark$csv$CsvRelation$$parseCSV$1.apply(CsvRelation.scala:130)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157
错误原因:
读取csv格式文件时加载的包错误:com.databricks:spark-csv_2.10:1.0.3
解决方法:
修改sparkR shell启动命令:
bin/sparkR --packages com.databricks:spark-csv_2.10:1.3.0
相关文章推荐
- Java常用工具类(二)org.apache.commons.lang
- Apache的Mesos和Google的Kubernetes 有什么区别?
- 集群管理和分布式任务 Apache helix 抽象模型
- mac下启动Apache服务
- config a writable directory on apache
- apache跨域问题
- apache启动脚本
- apache操作
- apacheserver下载、安装、配置
- org.apache.commons.fileupload.FileUploadBase$SizeLimitExceededException:
- centos7编译安装apache
- Apache Ignite 网格计算 (可用来取代dubbo等分布式RPC)与 spring 整合
- apache commons fileupload 1.3.1(九)FileUploadBase部分
- Ubuntu 14.04更改Apache网站根目录
- Apache安装无服务
- Failed to read artifact descriptor for org.apache.maven.plugins:maven-jar-plugin:jar:2.5
- PHP Apache服务配置
- 在 Apache Hive 中轻松生存的12个技巧
- 在 Apache Hive 中轻松生存的12个技巧
- java.lang.noclassdeffounderror:org/apache/hadoop/fs/fsdatainputstream