您的位置:首页 > 其它

Spark Q&A : Spark利用databricks读取CSV文件报错 CSVFormat NoClassDefFoundError

2017-05-23 11:19 309 查看
Q: Spark使用databricks进行csv文件读取的时候报错
java.lang.NoClassDefFoundError: org/apache/commons/csv/CSVFormat
,找不到对应的CSVFormat类.

A: 根据kevinskiiGithub上的回答,该问题出现的原因是在于spark-csv的jar文件中没有添加CSVFormat的依赖. 解决办法是下载common-csv的jar包并通过 -jar 添加到spark-submit的任务中.

It seems that the org/apache/commons/csv/CSVFormat dependency isn’t being packaged in the spark-csv jar file. Downloading the binary from (https://commons.apache.org/proper/commons-csv/download_csv.cgi), extracting the .jar from it and setting the permissions, and finally including it in the list of comma-separated JAR files following the “–jar” option when running the Spark shell solved it for me.

Example:

bin/pyspark –jars /path/to/spark-csv.jar,/path/to/commons-csv.jar

同时, m-mashayestackoverflow上给出了用textFile读取csv文件,并通过
case class
构建DF的解决办法, 适用于尝试过各种办法但是仍不能解决问题的绝望者.

Instead of using sqlContext.read, I used the following code to turn my .csv file into a dataframe. Suppose the .csv file has 5 columns as follow:

// Define case class
case class Flight(arrDelay: Int, depDelay: Int, origin: String, dest: String, distance: Int)
// Then
val flights=sc.textFile("2008.csv").map(_.split(",")).map(p => Flight(p(0).trim.toInt, p(1).trim.toInt, p(2)
, p(3), p(4).trim.toInt)).toDF()
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: