您的位置：首页 > 其它

Spark读取文件

2015-07-04 12:05 260 查看

1.textFile：

其定义为:def textFile(path: String, minPartitions: Int = defaultMinPartitions): RDD[String]；从HDFS，本地或者任何Hadoop支持的文件系统URI读取文件，返回String RDD。

2.wholeTextFiles：

其定义为:def wholeTextFiles(path: String, minPartitions: Int = defaultMinPartitions): RDD[(String, String)]；例如，有下列文件:

hdfs://a-hdfs-path/part-00000

hdfs://a-hdfs-path/part-00001

…

hdfs://a-hdfs-path/part-nnnnn

读取：

val rdd = sparkContext.wholeTextFile(“hdfs://a-hdfs-path”)

之后RDD所包含的内容：

(a-hdfs-path/part-00000, its content)

(a-hdfs-path/part-00001, its content)

…

(a-hdfs-path/part-nnnnn, its content)

3.binaryFiles：

用于读取二进制文件，跟wholeTextFiles的用法相同。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： spark

相关文章推荐

新的分享

章节导航