ClassNotFoundException: Failed to find data source: jdbc
2018-02-07 15:38
761 查看
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: jdbc. Please find packages
at http://spark.apache.org/third-party-projects.html
The question is almost Why
does format("kafka") fail with "Failed to find data source: kafka." with uber-jar? with the differences that the other OP used Apache Maven to create an uber-jar and here it's about sbt (sbt-assembly plugin's
configuration to be precise).
The short name (aka alias) of a data source, e.g.
are only available if the corresponding
a
For
to work Spark SQL uses META-INF/services/org.apache.spark.sql.sources.DataSourceRegister with
the following entry (there are others):
That's
what ties
with the data source.
And you've excluded it from an uber-jar by the following
Note
That's the root cause.
Just to check that the "infrastructure" is available and you could use the
source by its fully-qualified name (not the alias), try this:
You will see other problems due to missing options like
but...we're digressing.
A solution is to
would create an uber-jar with all data sources, incl. the
source).
source is an external module
and is not available to Spark applications by default.
You have to define it as a dependency in your
you have done), but that's just the very first step to have it in your Spark application.
With that dependency you have to decide whether you want to create a so-called uber-jar that
would have all the dependencies bundled altogether (that results in a fairly big jar file and makes the submission time longer) or use
less flexible
option to add the dependency at
(There are other options like storing the required jars on Hadoop HDFS or using Hadoop distribution-specific ways of defining dependencies for Spark applications, but let's keep things simple)
I'd recommend using
and only when it works consider the other options.
Use
follows.
Include the other command-line options as you wish.
Including all the dependencies in a so-called uber-jar may not
always work due to how
are handled.
For
source to work (and other data sources in general) you have to ensure that
all the data sources are merged (not
whatever strategy you use).
sources uses its own META-INF/services/org.apache.spark.sql.sources.DataSourceRegister that
registers org.apache.spark.sql.kafka010.KafkaSourceProvider as
the data source provider for
at http://spark.apache.org/third-party-projects.html
The question is almost Why
does format("kafka") fail with "Failed to find data source: kafka." with uber-jar? with the differences that the other OP used Apache Maven to create an uber-jar and here it's about sbt (sbt-assembly plugin's
configuration to be precise).
The short name (aka alias) of a data source, e.g.
jdbcor
kafka,
are only available if the corresponding
META-INF/services/org.apache.spark.sql.sources.DataSourceRegisterregisters
a
DataSourceRegister.
For
jdbcalias
to work Spark SQL uses META-INF/services/org.apache.spark.sql.sources.DataSourceRegister with
the following entry (there are others):
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider
That's
what ties
jdbcalias up
with the data source.
And you've excluded it from an uber-jar by the following
assemblyMergeStrategy.
assemblyMergeStrategy in assembly := { case PathList("META-INF", xs @ _*) => MergeStrategy.discard case x => MergeStrategy.first }
Note
case PathList("META-INF", xs @ _*)which you simply
MergeStrategy.discard.
That's the root cause.
Just to check that the "infrastructure" is available and you could use the
jdbcdata
source by its fully-qualified name (not the alias), try this:
spark.read. format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider"). load("jdbc:postgresql://localhost/testdb")
You will see other problems due to missing options like
url,
but...we're digressing.
A solution is to
MergeStrategy.concatall
META-INF/services/org.apache.spark.sql.sources.DataSourceRegister(that
would create an uber-jar with all data sources, incl. the
jdbcdata
source).
case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
kafkadata
source is an external module
and is not available to Spark applications by default.
You have to define it as a dependency in your
pom.xml(as
you have done), but that's just the very first step to have it in your Spark application.
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql-kafka-0-10_2.11</artifactId> <version>2.2.0</version> </dependency>
With that dependency you have to decide whether you want to create a so-called uber-jar that
would have all the dependencies bundled altogether (that results in a fairly big jar file and makes the submission time longer) or use
--packages(or
less flexible
--jars)
option to add the dependency at
spark-submittime.
(There are other options like storing the required jars on Hadoop HDFS or using Hadoop distribution-specific ways of defining dependencies for Spark applications, but let's keep things simple)
I'd recommend using
--packagesfirst
and only when it works consider the other options.
Use
spark-submit --packagesto include the spark-sql-kafka-0-10 module as
follows.
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0
Include the other command-line options as you wish.
Uber-Jar Approach
Including all the dependencies in a so-called uber-jar may notalways work due to how
META-INFdirectories
are handled.
For
kafkadata
source to work (and other data sources in general) you have to ensure that
META-INF/services/org.apache.spark.sql.sources.DataSourceRegisterof
all the data sources are merged (not
replaceor
firstor
whatever strategy you use).
kafkadata
sources uses its own META-INF/services/org.apache.spark.sql.sources.DataSourceRegister that
registers org.apache.spark.sql.kafka010.KafkaSourceProvider as
the data source provider for
kafkaformat.
相关文章推荐
- jsp项目文件运行出现java.lang.ClassNotFoundException和Failed to load or instantiate TagExtraInfo class异常
- Unable to find parent packages json-default 与java.lang.ClassNotFoundException: net.sf.ezmorph.Morphe
- Android开发问题:ActivityNotFoundException: Unable to find explicit activity class
- android.content.ActivityNotFoundException: Unable to find explicit activity class错误原因排查
- Robotium Instrumentation run failed due to 'java.lang.ClassNotFoundException'
- AndroidStudio的 Caused by java.lang.ClassNotFoundException Didn't find class com.gizwits.opensource.a
- android.content.ActivityNotFoundException: Unable to find explicit activity class 【安卓报错】
- 【Android 疑难杂症1】android.content.ActivityNotFoundException: Unable to find explicit activity class
- android.content.ActivityNotFoundException: Unable to find explicit activity class解决办法
- android.content.ActivityNotFoundException: Unable to find explicit activity class {...}; have you de
- java.lang.ClassNotFoundException: Didn't find class "*****Activity" on path: /data/app/*******.apk
- 在eclipse下用java.lang.ClassNotFoundException: org.apache.commons.dbcp.BasicDataSource定义data-source Tomcat无法启动的解决方法
- android.content.ActivityNotFoundException: Unable to find explicit activity class have you declared this activity in your AndroidManifest.xml?
- Android开发问题:ActivityNotFoundException: Unable to find explicit activity class
- java.lang.ClassNotFoundException: Didn't find class "*****Activity" on path: /data/app/*******.apk
- Unable to instantiate activity ComponentInfo{}: java.lang.ClassNotFoundException: Didn't find class
- How to install MySQL JDBC driver in Eclipse web project without java.lang.ClassNotFoundexception com
- java.lang.ClassNotFoundException: Didn't find class "*****Activity" on path: /data/app/*******.apk
- 【异常解决】android.content.ActivityNotFoundException: Unable to find explicit activity class
- android.content.ActivityNotFoundException: Unable to find explicit activity class {...}; have you declared this activity in your AndroidManifest.xml?