mllib:Exception in thread "main" org.apache.spark.SparkException: Input validation failed.
2015-04-09 22:31
363 查看
当我们使用mllib做分类,用到逻辑回归或线性支持向量机做分类时,可能会出现下面的错误:
15/04/09 21:27:25 ERROR DataValidators: Classification labels should be 0 or 1. Found 3000000 invalid labels
Exception in thread "main" org.apache.spark.SparkException: Input validation failed.
由于做调试时,debug输出的信息很多,我们常常忽略了上边的ERROR,而特别关注Input validation failed。
寻找源码,先关校验数据代码如下:
// Check the data properties before running the optimizer
if (validateData && !validators.forall(func => func(input))) {
throw new SparkException("Input validation failed.")
}
源码仅此而已,并未能得到解决问题的办法。然后,后来才发现错误信息还有上边儿的error。
错误信息的意思是分类标签应该是0或者1,而不能是其他值。当时我的类别标签中包含了2,正好是3000000条信息;于是将类别标签替换成0或1,编译通过。
这也证明了为什么说线性支持向量机适合做二分类的数据。当然,修改算法它也可以支持三种类别,网上有大量相关实现。
15/04/09 21:27:25 ERROR DataValidators: Classification labels should be 0 or 1. Found 3000000 invalid labels
Exception in thread "main" org.apache.spark.SparkException: Input validation failed.
由于做调试时,debug输出的信息很多,我们常常忽略了上边的ERROR,而特别关注Input validation failed。
寻找源码,先关校验数据代码如下:
// Check the data properties before running the optimizer
if (validateData && !validators.forall(func => func(input))) {
throw new SparkException("Input validation failed.")
}
源码仅此而已,并未能得到解决问题的办法。然后,后来才发现错误信息还有上边儿的error。
错误信息的意思是分类标签应该是0或者1,而不能是其他值。当时我的类别标签中包含了2,正好是3000000条信息;于是将类别标签替换成0或1,编译通过。
这也证明了为什么说线性支持向量机适合做二分类的数据。当然,修改算法它也可以支持三种类别,网上有大量相关实现。
相关文章推荐
- 提交spark的代码的时候出现Exception in thread "main" org.apache.SparkException:Yarn application has already end
- Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
- Hadoop分布式集群 EclipseException in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputExce
- Exception in thread "main" org.apache.spark.SparkException: Task not serializable--two
- Exception in thread "main" org.apache.spark.SparkException: Task not serializable异常
- Exception in thread "main" org.apache.spark.SparkException: Application application_1498149692663_01
- Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: use
- Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/juli/logging/LogFactory
- 解决Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
- Exception in thread "main" org.apache.axis2.wsdl.codegen.CodeGenerationException : Error parsing WSD
- Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/lang/exception/Nestabl
- exception in thread main org.apache.spark.sparkexception:A master URL must be set in your
- Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException
- Exceptionin thread "main" java.lang.UnsatisfiedLinkError:org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjav
- Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/juli/l
- Exception in thread "main" java.lang.NoSuchMethodError: org.apache.lucene.codecs.DocValuesFormat: me
- Exception in thread "main" java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.POIFSFileSys
- Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
- Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xml/serializer/TreeWalker
- exception in thread main org.apache.spark.sparkexception:A master URL must be set in your