tachyon与hdfs,以及spark整合
2015-09-14 17:50
232 查看
Tachyon 0.7.1伪分布式集群安装与测试:
/article/1810534.html
从官方文档得知,Spark 1.4.x和Tachyon 0.6.4版本兼容,而最新版的Tachyon 0.7.1和Spark 1.5.x兼容,目前所用的Spark为1.4.1,tachyon为 0.7.1
上传文件到hdfs
通过tachyon 读取/data/spark/bank/bank-full.csv文件
count
感觉错误很诡异,有人知道这是什么原因?tell me why?
但是 我在tachyon 文件系统中可以看到如下内容:
而bank-full.csv在hdfs文件是
其实Tachyon本身将bank-full.csv文件加载到了内存,并存放到自身的文件系统里面:tachyon://master:19998/data/spark/bank/bank-full.csv/bank-full.csv”
Tachyon的conf/tachyon-env.sh文件里面配置的,通过export TACHYON_UNDERFS_ADDRESS=hdfs://master:8020配置,这样tachyon://localhost:19998就可以获取hdfs文件指定路径文件
好吧,那我就先通过hdfs方式读取文件然后 保存到tachyon
未完成,待续~
/article/1810534.html
从官方文档得知,Spark 1.4.x和Tachyon 0.6.4版本兼容,而最新版的Tachyon 0.7.1和Spark 1.5.x兼容,目前所用的Spark为1.4.1,tachyon为 0.7.1
tachyon 与 hdfs整合
修改tachyon-env.shexport TACHYON_UNDERFS_ADDRESS=hdfs://master:8020 Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/data
上传文件到hdfs
hadoop fs -put /home/cluster/data/test/bank/ /data/spark/ hadoop fs -ls /data/spark/bank/ Found 3 items -rw-r--r-- 3 wangyue supergroup 4610348 2015-09-11 20:02 /data/spark/bank/bank-full.csv -rw-r--r-- 3 wangyue supergroup 3864 2015-09-11 20:02 /data/spark/bank/bank-names.txt -rw-r--r-- 3 wangyue supergroup 461474 2015-09-11 20:02 /data/spark/bank/bank.csv
通过tachyon 读取/data/spark/bank/bank-full.csv文件
val bankFullFile = sc.textFile("tachyon://master:19998/data/spark/bank/bank-full.csv/bank-full.csv") 2015-09-11 20:08:20,136 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(177384) called with curMem=630803, maxMem=257918238 2015-09-11 20:08:20,137 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_3 stored as values in memory (estimated size 173.2 KB, free 245.2 MB) 2015-09-11 20:08:20,154 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(17665) called with curMem=808187, maxMem=257918238 2015-09-11 20:08:20,155 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_3_piece0 stored as bytes in memory (estimated size 17.3 KB, free 245.2 MB) 2015-09-11 20:08:20,156 INFO [sparkDriver-akka.actor.default-dispatcher-2] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_3_piece0 in memory on localhost:41040 (size: 17.3 KB, free: 245.9 MB) 2015-09-11 20:08:20,157 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Created broadcast 3 from textFile at <console>:21 bankFullFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[7] at textFile at <console>:21
count
bankFullFile.count() 但是发现报错如下: 2015-09-11 21:34:31,494 WARN [Executor task launch worker-6] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,495 WARN [Executor task launch worker-6] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,489 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,495 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,495 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,495 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,495 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,495 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,496 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,496 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,496 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,496 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,496 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,496 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing 2015-09-11 21:34:31,496 WARN [Executor task launch worker-7] (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
感觉错误很诡异,有人知道这是什么原因?tell me why?
但是 我在tachyon 文件系统中可以看到如下内容:
./bin/tachyon tfs ls /data/spark/bank/bank-full.csv/ 4502.29 KB09-11-2015 20:09:02:078 Not In Memory /data/spark/bank/bank-full.csv/bank-full.csv
而bank-full.csv在hdfs文件是
hadoop fs -ls /data/spark/bank/ Found 3 items -rw-r--r-- 3 wangyue supergroup 4610348 2015-09-11 20:02 /data/spark/bank/bank-full.csv -rw-r--r-- 3 wangyue supergroup 3864 2015-09-11 20:02 /data/spark/bank/bank-names.txt -rw-r--r-- 3 wangyue supergroup 461474 2015-09-11 20:02 /data/spark/bank/bank.csv
其实Tachyon本身将bank-full.csv文件加载到了内存,并存放到自身的文件系统里面:tachyon://master:19998/data/spark/bank/bank-full.csv/bank-full.csv”
Tachyon的conf/tachyon-env.sh文件里面配置的,通过export TACHYON_UNDERFS_ADDRESS=hdfs://master:8020配置,这样tachyon://localhost:19998就可以获取hdfs文件指定路径文件
好吧,那我就先通过hdfs方式读取文件然后 保存到tachyon
scala> val bankfullfile = sc.textFile("/data/spark/bank/bank-full.csv") scala> bankfullfile.count res0: Long = 45212 scala> bankfullfile.saveAsTextFile("tachyon://master:19998/data/spark/bank/newbankfullfile")
未完成,待续~
相关文章推荐
- tachyon与hdfs,以及spark整合
- hadoop2.7集群,新增datanode节点后报错的解决思路
- HDFS文件写入与读取
- 对HDFS进行操作——笔记
- hadoop三种安装模式
- hadoop新增节点
- flume-hdfs 按照时间关闭并新开文件
- hdfs体系
- 在基于docker的Hadoop集群上搭建Spark
- Dream------Hadoop--网络拓扑与Hadoop--摘抄
- Dream------Hadoop--Hadoop HA QJM (Quorum Journal Manager)
- HDFS配置
- hdfs源码分析第二弹
- HDFS和Hbase误删数据恢复
- hdfs简要教程
- HDFS HA部署(多机)——配置文件
- 安装hadoop-2.3.0-cdh5.1.2
- Hadoop中MapReduce多种join实现实例分析
- Spark中加载本地(或者hdfs)文件以及SparkContext实例的textFile使用
- hadoop yarn 实战错误汇总