Spark 2.0 DataFrame map操作中Unable to find encoder for type stored in a Dataset.问题的分析与解决
2016-10-20 12:17
976 查看
随着新版本的spark已经逐渐稳定,最近拟将原有框架升级到spark 2.0。还是比较兴奋的,特别是SQL的速度真的快了许多。。
然而,在其中一个操作时却卡住了。主要是dataframe.map操作,这个之前在spark 1.X是可以运行的,然而在spark 2.0上却无法通过。。
看了提醒的问题,主要是:
******error:
Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. resDf_upd.map(row =>
{******
针对这个问题,网上所得获取的资料还真不多。不过想着肯定是dataset统一了datframe与rdd之后就出现了新的要求。
经过查看spark官方文档,对spark有了一条这样的描述。
Dataset is Spark SQL’s strongly-typed API for working with structured data, i.e. records with a known schema.
Datasets are lazy and structured query expressions are only triggered when an action is invoked. Internally, a
plan that describes the computation query required to produce the data (for a givenSpark SQL session).
A Dataset is a result of executing a query expression against data storage like files,
Hive tables or JDBC databases. The structured query expression can be described by a SQL query, a Column-based SQL expression or a Scala/Java lambda function. And that is why Dataset operations are available in three variants.
从这可以看出,要想对dataset进行操作,需要进行相应的encode操作。特别是官网给的例子
从这看出,要进行map操作,要先定义一个Encoder。。
这就增加了系统升级繁重的工作量了。为了更简单一些,幸运的dataset也提供了转化RDD的操作。因此只需要将之前dataframe.map
在中间修改为:dataframe.rdd.map即可。
然而,在其中一个操作时却卡住了。主要是dataframe.map操作,这个之前在spark 1.X是可以运行的,然而在spark 2.0上却无法通过。。
看了提醒的问题,主要是:
******error:
Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. resDf_upd.map(row =>
{******
针对这个问题,网上所得获取的资料还真不多。不过想着肯定是dataset统一了datframe与rdd之后就出现了新的要求。
经过查看spark官方文档,对spark有了一条这样的描述。
Dataset is Spark SQL’s strongly-typed API for working with structured data, i.e. records with a known schema.
Datasets are lazy and structured query expressions are only triggered when an action is invoked. Internally, a
Datasetrepresents a logical
plan that describes the computation query required to produce the data (for a givenSpark SQL session).
A Dataset is a result of executing a query expression against data storage like files,
Hive tables or JDBC databases. The structured query expression can be described by a SQL query, a Column-based SQL expression or a Scala/Java lambda function. And that is why Dataset operations are available in three variants.
从这可以看出,要想对dataset进行操作,需要进行相应的encode操作。特别是官网给的例子
// No pre-defined encoders for Dataset[Map[K,V]], define explicitly implicit val mapEncoder = org.apache.spark.sql.Encoders.kryo[Map[String, Any]] // Primitive types and case classes can be also defined as // implicit val stringIntMapEncoder: Encoder[Map[String, Any]] = ExpressionEncoder() // row.getValuesMap[T] retrieves multiple columns at once into a Map[String, T] teenagersDF.map(teenager => teenager.getValuesMap[Any](List("name", "age"))).collect() // Array(Map("name" -> "Justin", "age" -> 19))
从这看出,要进行map操作,要先定义一个Encoder。。
这就增加了系统升级繁重的工作量了。为了更简单一些,幸运的dataset也提供了转化RDD的操作。因此只需要将之前dataframe.map
在中间修改为:dataframe.rdd.map即可。
相关文章推荐
- Spark 2.0 DataFrame map操作中Unable to find encoder for type stored in a Dataset.问题的分析与解决
- Spark 2.0 DataFrame map操作中Unable to find encoder for type stored in a Dataset.问题的分析与解决
- Spark 2.0 DataFrame map操作中Unable to find encoder for type stored in a Dataset问题的分析与解决
- 解决 Error:Unable to find encoder for type stored in a Dataset
- 安装Fedora时遇到unable to find any devices of the type needed for this installation type问题的解决
- 解决使用Struts2的s:textfield标签出现Unable to find setter method for attribute: style的问题
- Unable to allocate RAM for process text/data, errno 12问题解决
- Android问题集锦之九:Unable to find a 'userdata.img' file for ABI armeabi to copy into the AVD folder.
- 创建虚拟机错误解决:Unable to find a 'userdata.img' file for ABI x86 to copy into the AVD folder.
- 记一次git fatal: Unable to find remote helper for 'https'问题的解决
- 问题解决:INSTRUMENTATION_STATUS: Error=Unable to find instrumentation info for
- Unable to locate tools.jar. Expected to find it in D:/jre6/lib/tools.jar问题解决
- IMF 传奇行动 启动SPARK master无法启动 内存不够问题解决) failed to map 715849728 bytes for committing reserved memory.
- Unable to find a result type for extension [...] in location attribute
- Unable to locate tools.jar. Expected to find it in D:/jre6/lib/tools.jar问题解决
- 基于hadoop 2.0 的hbase "Unable to load native-hadoop library for your platform" 问题解决
- 如何避免spark dataframe的JOIN操作之后产生重复列(Reference '***' is ambiguous问题解决)
- 解决 paramiko 安装问题 Unable to find vcvarsall.bat
- VSS 客户端不能访问问题“unable to open user login file\\服务器项目管理目录\data\logedin\用户名.log
- Unable to compile class for JSP 问题解决方法