您的位置:首页 > 产品设计 > UI/UE

大数据可视化工具---GraphBuilder demo

2013-12-05 23:15 363 查看
Intel近日开源了GraphBuilder测试版本的源码。

GraphBuilder由英特尔研究院(Intel Labs)开发,是首个针对大数据的可扩展的开源Java库,可以将大数据集构建成图形——能够反映数据之间关系的网络状结构图,帮助行业和学术界的科学家或数据分析师快速分析大型数据集。

GraphBuilder使用MapReduce并行编程模型进行扩展,其主要组件及与Hadoop MapReduce的关系如下图所示。



GraphBuilder的源码基于Apache 2许可协议,可以通过官网来获得源码。

1.从官网下载GraphBuilder的源码

https://01.org/graphbuilder/

wget https://01.org/graphbuilder/sites/default/files/downloads/graphbuilder-1.0.tar_1.gz
2.解压安装GraphBuilder的源码

tar zvxf graphbuilder-1.0.tar_1.gz

cd graphbuilder

mvn package

.............................................

[INFO] Reading assembly descriptor: hadoop-job.xml

[INFO] Building jar: /usr/grid/graphbuilder/target/graphbuilder-1.0.0-SNAPSHOT-hadoop-job.jar

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 3:49.505s

[INFO] Finished at: Thu Dec 05 22:19:04 CST 2013

[INFO] Final Memory: 17M/66M

[INFO] ------------------------------------------------------------------------

[grid@localhost graphbuilder]$

通过编译信息/usr/grid/graphbuilder/target/graphbuilder-1.0.0-SNAPSHOT-hadoop-job.jar生成了。

3. 下载wiki的样例文件并且解压:

[grid@localhost graphbuilder]$ wget http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2
--2013-12-05 22:23:31-- http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2
正在解析主机 dumps.wikimedia.org... 208.80.152.185

正在连接 dumps.wikimedia.org|208.80.152.185|:80... 已连接。

已发出 HTTP 请求,正在等待回应... 200 OK

长度:43533584 (42M) [application/x-bzip]

正在保存至: “enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2”

Length: 43533584 (42M) [application/x-bzip]

Saving to:latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2

100%[================================================================>] 43,533,584 24.0K/s in 26m 35s

2013-12-05 05:17:24 (26.7 KB/s) -latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2?saved [43533584/43533584]

You have new mail in /var/spool/mail/root

[root@hadoop graphbuilder]# bzip2 -d enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2

[root@hadoop graphbuilder]# ll

total 149880

drwxrwxr-x. 3 1001 1001 4096 Jun 28 20:30 demoapps

drwxrwxr-x. 6 1001 1001 4096 Jun 28 20:30 doc

-rw-r--r--. 1 root root 153424297 Dec 5 05:41 enwiki-latest-pages-articles1.xml-p000000010p000010000

4.启动hadoop

[grid@localhost graphbuilder]$ start-all.sh

Warning: $HADOOP_HOME is deprecated.

starting namenode, logging to /usr/grid/hadoop/libexec/../logs/hadoop-grid-namenode-h3.out

localhost: starting datanode, logging to /usr/grid/hadoop/libexec/../logs/hadoop-grid-datanode-localhost.localdomain.out

localhost: starting secondarynamenode, logging to /usr/grid/hadoop/libexec/../logs/hadoop-grid-secondarynamenode-localhost.localdomain.out

starting jobtracker, logging to /usr/grid/hadoop/libexec/../logs/hadoop-grid-jobtracker-h3.out

localhost: starting tasktracker, logging to /usr/grid/hadoop/libexec/../logs/hadoop-grid-tasktracker-localhost.localdomain.out

[grid@localhost graphbuilder]$

[root@hadoop sbin]# jps

4055 DataNode

4448 NodeManager

4358 ResourceManager

3968 NameNode

4741 Jps

4190 SecondaryNameNode

[root@hadoop bin]# hadoop fs -ls /

Found 4 items

drwxr-xr-x - root supergroup 0 2013-12-01 21:41 /home

drwxr-xr-x - root supergroup 0 2013-12-01 21:38 /test

drwxr-xr-x - root supergroup 0 2013-12-01 21:51 /tmp

drwxr-xr-x - root supergroup 0 2013-12-01 21:52 /tmp-output

[root@hadoop bin]# hadoop fs -mkdir /user/

[root@hadoop bin]# hadoop fs -mkdir /user/wiki-input

[root@hadoop ~]# hadoop dfs -copyFromLocal enwiki-latest-pages-articles1.xml-p000000010p000010000 /user/wiki-input

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

[grid@localhost ~]$ hadoop jar /usr/grid/graphbuilder/target/graphbuilder-1.0.0-SNAPSHOT-hadoop-job.jar com.intel.hadoop.graphbuilder.demoapps.wikipedia.linkgraph.LinkGraphEnd2End 3 /user/wiki-input /user/en-wiki-articles-output 2

Warning: $HADOOP_HOME is deprecated.

13/12/05 22:52:04 INFO docwordgraph.CreateWordCountGraph: ========== Creating Graph ================

13/12/05 22:52:07 INFO mapreduce.CreateGraphMR: =========== Job: Create initial graph from raw data ===========

13/12/05 22:52:07 INFO mapreduce.CreateGraphMR: input: /user/wiki-input

13/12/05 22:52:07 INFO mapreduce.CreateGraphMR: Output = /user/en-wiki-articles-output/graph_raw

13/12/05 22:52:07 INFO mapreduce.CreateGraphMR: Inputformat = com.intel.hadoop.graphbuilder.demoapps.wikipedia.WikiPageInputFormat

13/12/05 22:52:07 INFO mapreduce.CreateGraphMR: GraphTokenizer = com.intel.hadoop.graphbuilder.demoapps.wikipedia.linkgraph.LinkGraphTokenizer

13/12/05 22:52:07 INFO mapreduce.CreateGraphMR: ==================== Start ====================================

13/12/05 22:52:07 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/12/05 22:52:08 INFO mapred.FileInputFormat: Total input paths to process : 1

13/12/05 22:52:09 INFO mapred.JobClient: Running job: job_201312052240_0001

13/12/05 22:52:10 INFO mapred.JobClient: map 0% reduce 0%

13/12/05 22:53:04 INFO mapred.JobClient: map 1% reduce 0%

13/12/05 22:53:07 INFO mapred.JobClient: map 3% reduce 0%

13/12/05 22:53:10 INFO mapred.JobClient: map 6% reduce 0%

13/12/05 22:53:13 INFO mapred.JobClient: map 8% reduce 0%

13/12/05 22:53:16 INFO mapred.JobClient: map 11% reduce 0%

13/12/05 22:53:19 INFO mapred.JobClient: map 13% reduce 0%

13/12/05 22:53:22 INFO mapred.JobClient: map 17% reduce 0%

13/12/05 22:53:25 INFO mapred.JobClient: map 21% reduce 0%

13/12/05 22:53:28 INFO mapred.JobClient: map 23% reduce 0%

13/12/05 22:53:31 INFO mapred.JobClient: map 26% reduce 0%

13/12/05 22:53:33 INFO mapred.JobClient: map 27% reduce 0%

13/12/05 22:53:36 INFO mapred.JobClient: map 29% reduce 0%

13/12/05 22:53:39 INFO mapred.JobClient: map 33% reduce 0%

13/12/05 22:53:42 INFO mapred.JobClient: map 36% reduce 0%

13/12/05 22:53:46 INFO mapred.JobClient: map 39% reduce 0%

13/12/05 22:53:49 INFO mapred.JobClient: map 43% reduce 0%

13/12/05 22:53:52 INFO mapred.JobClient: map 46% reduce 0%

13/12/05 22:53:55 INFO mapred.JobClient: map 49% reduce 0%

13/12/05 22:53:58 INFO mapred.JobClient: map 50% reduce 0%

13/12/05 22:54:01 INFO mapred.JobClient: map 51% reduce 0%

13/12/05 22:54:04 INFO mapred.JobClient: map 55% reduce 0%

13/12/05 22:54:07 INFO mapred.JobClient: map 58% reduce 0%

13/12/05 22:54:10 INFO mapred.JobClient: map 62% reduce 0%

13/12/05 22:54:13 INFO mapred.JobClient: map 65% reduce 0%

13/12/05 22:54:16 INFO mapred.JobClient: map 66% reduce 0%

13/12/05 22:54:31 INFO mapred.JobClient: map 83% reduce 0%

13/12/05 22:54:34 INFO mapred.JobClient: map 85% reduce 0%

13/12/05 22:54:37 INFO mapred.JobClient: map 95% reduce 22%

13/12/05 22:54:40 INFO mapred.JobClient: map 100% reduce 22%

13/12/05 22:54:46 INFO mapred.JobClient: map 100% reduce 33%

13/12/05 22:54:52 INFO mapred.JobClient: map 100% reduce 70%

13/12/05 22:54:55 INFO mapred.JobClient: map 100% reduce 72%

13/12/05 22:54:58 INFO mapred.JobClient: map 100% reduce 75%

13/12/05 22:55:01 INFO mapred.JobClient: map 100% reduce 80%

13/12/05 22:55:04 INFO mapred.JobClient: map 100% reduce 84%

13/12/05 22:55:07 INFO mapred.JobClient: map 100% reduce 88%

13/12/05 22:55:11 INFO mapred.JobClient: map 100% reduce 92%

13/12/05 22:55:14 INFO mapred.JobClient: map 100% reduce 97%

13/12/05 22:55:19 INFO mapred.JobClient: map 100% reduce 100%

13/12/05 22:55:24 INFO mapred.JobClient: Job complete: job_201312052240_0001

13/12/05 22:55:24 INFO mapred.JobClient: Counters: 32

13/12/05 22:55:24 INFO mapred.JobClient: Job Counters

13/12/05 22:55:24 INFO mapred.JobClient: Launched reduce tasks=1

13/12/05 22:55:24 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=268128

13/12/05 22:55:24 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

13/12/05 22:55:24 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

13/12/05 22:55:24 INFO mapred.JobClient: Launched map tasks=3

13/12/05 22:55:24 INFO mapred.JobClient: Data-local map tasks=3

13/12/05 22:55:24 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=59264

13/12/05 22:55:24 INFO mapred.JobClient: File Input Format Counters

13/12/05 22:55:24 INFO mapred.JobClient: Bytes Read=153510857

13/12/05 22:55:24 INFO mapred.JobClient: com.intel.hadoop.graphbuilder.preprocess.mapreduce.CreateGraphReducer$CREATE_GRAPH_COUNTER

13/12/05 22:55:24 INFO mapred.JobClient: NUM_EDGES=662471

13/12/05 22:55:24 INFO mapred.JobClient: NUM_VERTICES=360189

13/12/05 22:55:24 INFO mapred.JobClient: File Output Format Counters

13/12/05 22:55:24 INFO mapred.JobClient: Bytes Written=28502484

13/12/05 22:55:24 INFO mapred.JobClient: FileSystemCounters

13/12/05 22:55:24 INFO mapred.JobClient: FILE_BYTES_READ=94383675

13/12/05 22:55:24 INFO mapred.JobClient: HDFS_BYTES_READ=153511323

13/12/05 22:55:24 INFO mapred.JobClient: FILE_BYTES_WRITTEN=144767981

13/12/05 22:55:24 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28502484

13/12/05 22:55:24 INFO mapred.JobClient: Map-Reduce Framework

13/12/05 22:55:24 INFO mapred.JobClient: Map output materialized bytes=50292609

13/12/05 22:55:24 INFO mapred.JobClient: Map input records=6299

13/12/05 22:55:24 INFO mapred.JobClient: Reduce shuffle bytes=50292609

13/12/05 22:55:24 INFO mapred.JobClient: Spilled Records=4670039

13/12/05 22:55:24 INFO mapred.JobClient: Map output bytes=47049007

13/12/05 22:55:24 INFO mapred.JobClient: Total committed heap usage (bytes)=645189632

13/12/05 22:55:24 INFO mapred.JobClient: CPU time spent (ms)=106060

13/12/05 22:55:24 INFO mapred.JobClient: Map input bytes=153510857

13/12/05 22:55:24 INFO mapred.JobClient: SPLIT_RAW_BYTES=453

13/12/05 22:55:24 INFO mapred.JobClient: Combine input records=0

13/12/05 22:55:24 INFO mapred.JobClient: Reduce input records=1621723

13/12/05 22:55:24 INFO mapred.JobClient: Reduce input groups=1017821

13/12/05 22:55:24 INFO mapred.JobClient: Combine output records=0

13/12/05 22:55:24 INFO mapred.JobClient: Physical memory (bytes) snapshot=725356544

13/12/05 22:55:24 INFO mapred.JobClient: Reduce output records=1022660

13/12/05 22:55:24 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1517850624

13/12/05 22:55:24 INFO mapred.JobClient: Map output records=1621723

13/12/05 22:55:24 INFO mapreduce.CreateGraphMR: =================== Done ====================================

13/12/05 22:55:24 INFO docwordgraph.CreateWordCountGraph: ========== Done creating graph ================

13/12/05 22:55:24 INFO linkgraph.LinkGraphEnd2End: Create graph finished in : 200 seconds

13/12/05 22:55:24 INFO linkgraph.NormalizeGraphIds: ========== Normalizing Graph ============

13/12/05 22:55:24 INFO mapreduce.HashIdMR: ====== Job: Create integer Id maps for vertices ==========

13/12/05 22:55:24 INFO mapreduce.HashIdMR: Input = /user/en-wiki-articles-output/graph_raw/vdata

13/12/05 22:55:24 INFO mapreduce.HashIdMR: Output = /user/en-wiki-articles-output/graph_norm

13/12/05 22:55:24 INFO mapreduce.HashIdMR: ==========================================================

13/12/05 22:55:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/12/05 22:55:26 INFO mapred.FileInputFormat: Total input paths to process : 1

13/12/05 22:55:27 INFO mapred.JobClient: Running job: job_201312052240_0002

13/12/05 22:55:28 INFO mapred.JobClient: map 0% reduce 0%

13/12/05 22:55:46 INFO mapred.JobClient: map 100% reduce 0%

13/12/05 22:56:01 INFO mapred.JobClient: map 100% reduce 76%

13/12/05 22:56:04 INFO mapred.JobClient: map 100% reduce 84%

13/12/05 22:56:07 INFO mapred.JobClient: map 100% reduce 91%

13/12/05 22:56:13 INFO mapred.JobClient: map 100% reduce 100%

13/12/05 22:56:18 INFO mapred.JobClient: Job complete: job_201312052240_0002

13/12/05 22:56:18 INFO mapred.JobClient: Counters: 29

13/12/05 22:56:18 INFO mapred.JobClient: Job Counters

13/12/05 22:56:18 INFO mapred.JobClient: Launched reduce tasks=1

13/12/05 22:56:18 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16669

13/12/05 22:56:18 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

13/12/05 22:56:18 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

13/12/05 22:56:18 INFO mapred.JobClient: Launched map tasks=1

13/12/05 22:56:18 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=25850

13/12/05 22:56:18 INFO mapred.JobClient: File Input Format Counters

13/12/05 22:56:18 INFO mapred.JobClient: Bytes Read=7029652

13/12/05 22:56:18 INFO mapred.JobClient: File Output Format Counters

13/12/05 22:56:18 INFO mapred.JobClient: Bytes Written=12210267

13/12/05 22:56:18 INFO mapred.JobClient: FileSystemCounters

13/12/05 22:56:18 INFO mapred.JobClient: FILE_BYTES_READ=18381646

13/12/05 22:56:18 INFO mapred.JobClient: HDFS_BYTES_READ=7029788

13/12/05 22:56:18 INFO mapred.JobClient: FILE_BYTES_WRITTEN=27617033

13/12/05 22:56:18 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=12210267

13/12/05 22:56:18 INFO mapred.JobClient: Map-Reduce Framework

13/12/05 22:56:18 INFO mapred.JobClient: Map output materialized bytes=9190820

13/12/05 22:56:18 INFO mapred.JobClient: Map input records=360189

13/12/05 22:56:18 INFO mapred.JobClient: Reduce shuffle bytes=9190820

13/12/05 22:56:18 INFO mapred.JobClient: Spilled Records=1080567

13/12/05 22:56:18 INFO mapred.JobClient: Map output bytes=8470422

13/12/05 22:56:18 INFO mapred.JobClient: Total committed heap usage (bytes)=177016832

13/12/05 22:56:18 INFO mapred.JobClient: CPU time spent (ms)=17650

13/12/05 22:56:18 INFO mapred.JobClient: Map input bytes=7029652

13/12/05 22:56:18 INFO mapred.JobClient: SPLIT_RAW_BYTES=136

13/12/05 22:56:18 INFO mapred.JobClient: Combine input records=0

13/12/05 22:56:18 INFO mapred.JobClient: Reduce input records=360189

13/12/05 22:56:18 INFO mapred.JobClient: Reduce input groups=360189

13/12/05 22:56:18 INFO mapred.JobClient: Combine output records=0

13/12/05 22:56:18 INFO mapred.JobClient: Physical memory (bytes) snapshot=241008640

13/12/05 22:56:18 INFO mapred.JobClient: Reduce output records=720378

13/12/05 22:56:18 INFO mapred.JobClient: Virtual memory (bytes) snapshot=760651776

13/12/05 22:56:18 INFO mapred.JobClient: Map output records=360189

13/12/05 22:56:18 INFO mapreduce.HashIdMR: =======================Done =====================

13/12/05 22:56:18 INFO mapreduce.SortDictMR: ========== Job: Partition the map of rawid -> id ===========

13/12/05 22:56:18 INFO mapreduce.SortDictMR: Input = /user/en-wiki-articles-output/graph_norm/vidmap

13/12/05 22:56:18 INFO mapreduce.SortDictMR: Output = /user/en-wiki-articles-output/graph_norm/temp/partitionedvidmap

13/12/05 22:56:18 INFO mapreduce.SortDictMR: ======================================================

13/12/05 22:56:18 INFO mapreduce.SortDictMR: Partition on rawId.

13/12/05 22:56:18 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/12/05 22:56:19 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library

13/12/05 22:56:19 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 6bb1b7f8b9044d8df9b4d2b6641db7658aab3cf8]

13/12/05 22:56:19 INFO mapred.FileInputFormat: Total input paths to process : 1

13/12/05 22:56:19 INFO mapred.JobClient: Running job: job_201312052240_0003

13/12/05 22:56:20 INFO mapred.JobClient: map 0% reduce 0%

13/12/05 22:56:40 INFO mapred.JobClient: map 100% reduce 0%

13/12/05 22:56:55 INFO mapred.JobClient: map 100% reduce 100%

13/12/05 22:57:12 INFO mapred.JobClient: Job complete: job_201312052240_0003

13/12/05 22:57:12 INFO mapred.JobClient: Counters: 30

13/12/05 22:57:12 INFO mapred.JobClient: Job Counters

13/12/05 22:57:12 INFO mapred.JobClient: Launched reduce tasks=1

13/12/05 22:57:12 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=26847

13/12/05 22:57:12 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

13/12/05 22:57:12 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

13/12/05 22:57:12 INFO mapred.JobClient: Launched map tasks=2

13/12/05 22:57:12 INFO mapred.JobClient: Data-local map tasks=2

13/12/05 22:57:12 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=26028

13/12/05 22:57:12 INFO mapred.JobClient: File Input Format Counters

13/12/05 22:57:12 INFO mapred.JobClient: Bytes Read=9082303

13/12/05 22:57:12 INFO mapred.JobClient: File Output Format Counters

13/12/05 22:57:12 INFO mapred.JobClient: Bytes Written=0

13/12/05 22:57:12 INFO mapred.JobClient: FileSystemCounters

13/12/05 22:57:12 INFO mapred.JobClient: FILE_BYTES_READ=11240848

13/12/05 22:57:12 INFO mapred.JobClient: HDFS_BYTES_READ=9082579

13/12/05 22:57:12 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22633705

13/12/05 22:57:12 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=9079676

13/12/05 22:57:12 INFO mapred.JobClient: Map-Reduce Framework

13/12/05 22:57:12 INFO mapred.JobClient: Map output materialized bytes=11240854

13/12/05 22:57:12 INFO mapred.JobClient: Map input records=360189

13/12/05 22:57:12 INFO mapred.JobClient: Reduce shuffle bytes=5639237

13/12/05 22:57:12 INFO mapred.JobClient: Spilled Records=720378

13/12/05 22:57:12 INFO mapred.JobClient: Map output bytes=10520448

13/12/05 22:57:12 INFO mapred.JobClient: Total committed heap usage (bytes)=345362432

13/12/05 22:57:12 INFO mapred.JobClient: CPU time spent (ms)=6730

13/12/05 22:57:12 INFO mapred.JobClient: Map input bytes=9079676

13/12/05 22:57:12 INFO mapred.JobClient: SPLIT_RAW_BYTES=276

13/12/05 22:57:12 INFO mapred.JobClient: Combine input records=0

13/12/05 22:57:12 INFO mapred.JobClient: Reduce input records=360189

13/12/05 22:57:12 INFO mapred.JobClient: Reduce input groups=64

13/12/05 22:57:12 INFO mapred.JobClient: Combine output records=0

13/12/05 22:57:12 INFO mapred.JobClient: Physical memory (bytes) snapshot=440111104

13/12/05 22:57:12 INFO mapred.JobClient: Reduce output records=0

13/12/05 22:57:12 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1182158848

13/12/05 22:57:12 INFO mapred.JobClient: Map output records=360189

13/12/05 22:57:12 INFO mapreduce.SortDictMR: ======================= Done ==========================

13/12/05 22:57:12 INFO mapreduce.SortEdgeMR: ==== Job: Partition the input edges by hash(sourceid) =========

13/12/05 22:57:12 INFO mapreduce.SortEdgeMR: Input = /user/en-wiki-articles-output/graph_raw/edata

13/12/05 22:57:12 INFO mapreduce.SortEdgeMR: Output = /user/en-wiki-articles-output/graph_norm/temp/partitionededata

13/12/05 22:57:12 INFO mapreduce.SortEdgeMR: ===============================================================

13/12/05 22:57:12 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/12/05 22:57:13 INFO mapred.FileInputFormat: Total input paths to process : 1

13/12/05 22:57:13 INFO mapred.JobClient: Running job: job_201312052240_0004

13/12/05 22:57:14 INFO mapred.JobClient: map 0% reduce 0%

13/12/05 22:57:35 INFO mapred.JobClient: map 100% reduce 0%

13/12/05 22:57:56 INFO mapred.JobClient: map 100% reduce 100%

13/12/05 22:58:01 INFO mapred.JobClient: Job complete: job_201312052240_0004

13/12/05 22:58:01 INFO mapred.JobClient: Counters: 30

13/12/05 22:58:01 INFO mapred.JobClient: Job Counters

13/12/05 22:58:01 INFO mapred.JobClient: Launched reduce tasks=1

13/12/05 22:58:01 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=39329

13/12/05 22:58:01 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

13/12/05 22:58:01 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

13/12/05 22:58:01 INFO mapred.JobClient: Launched map tasks=2

13/12/05 22:58:01 INFO mapred.JobClient: Data-local map tasks=2

13/12/05 22:58:01 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13465

13/12/05 22:58:01 INFO mapred.JobClient: File Input Format Counters

13/12/05 22:58:01 INFO mapred.JobClient: Bytes Read=21476129

13/12/05 22:58:01 INFO mapred.JobClient: File Output Format Counters

13/12/05 22:58:01 INFO mapred.JobClient: Bytes Written=21472832

13/12/05 22:58:01 INFO mapred.JobClient: FileSystemCounters

13/12/05 22:58:01 INFO mapred.JobClient: FILE_BYTES_READ=50895706

13/12/05 22:58:01 INFO mapred.JobClient: HDFS_BYTES_READ=21476401

13/12/05 22:58:01 INFO mapred.JobClient: FILE_BYTES_WRITTEN=76410296

13/12/05 22:58:01 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=21472832

13/12/05 22:58:01 INFO mapred.JobClient: Map-Reduce Framework

13/12/05 22:58:01 INFO mapred.JobClient: Map output materialized bytes=25447850

13/12/05 22:58:01 INFO mapred.JobClient: Map input records=662471

13/12/05 22:58:01 INFO mapred.JobClient: Reduce shuffle bytes=12729709

13/12/05 22:58:01 INFO mapred.JobClient: Spilled Records=1987413

13/12/05 22:58:01 INFO mapred.JobClient: Map output bytes=24122803

13/12/05 22:58:01 INFO mapred.JobClient: Total committed heap usage (bytes)=369000448

13/12/05 22:58:01 INFO mapred.JobClient: CPU time spent (ms)=10870

13/12/05 22:58:01 INFO mapred.JobClient: Map input bytes=21472832

13/12/05 22:58:01 INFO mapred.JobClient: SPLIT_RAW_BYTES=272

13/12/05 22:58:01 INFO mapred.JobClient: Combine input records=0

13/12/05 22:58:01 INFO mapred.JobClient: Reduce input records=662471

13/12/05 22:58:01 INFO mapred.JobClient: Reduce input groups=64

13/12/05 22:58:01 INFO mapred.JobClient: Combine output records=0

13/12/05 22:58:01 INFO mapred.JobClient: Physical memory (bytes) snapshot=453345280

13/12/05 22:58:01 INFO mapred.JobClient: Reduce output records=662471

13/12/05 22:58:01 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1138835456

13/12/05 22:58:01 INFO mapred.JobClient: Map output records=662471

13/12/05 22:58:01 INFO mapreduce.SortEdgeMR: =================== Done ====================================

13/12/05 22:58:01 INFO mapreduce.TransEdgeMR: ============= Job: Normalize Ids in Edges ====================

13/12/05 22:58:01 INFO mapreduce.TransEdgeMR: Input = /user/en-wiki-articles-output/graph_norm/temp/partitionededata

13/12/05 22:58:01 INFO mapreduce.TransEdgeMR: Output = /user/en-wiki-articles-output/graph_norm/edata

13/12/05 22:58:01 INFO mapreduce.TransEdgeMR: Dictionary = /user/en-wiki-articles-output/graph_norm/temp/partitionedvidmap

13/12/05 22:58:01 INFO mapreduce.TransEdgeMR: ===============================================================

13/12/05 22:58:01 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/12/05 22:58:02 INFO mapred.FileInputFormat: Total input paths to process : 1

13/12/05 22:58:02 INFO mapred.JobClient: Running job: job_201312052240_0005

13/12/05 22:58:03 INFO mapred.JobClient: map 0% reduce 0%

13/12/05 22:58:23 INFO mapred.JobClient: map 68% reduce 0%

13/12/05 22:58:26 INFO mapred.JobClient: map 99% reduce 0%

13/12/05 22:58:29 INFO mapred.JobClient: map 100% reduce 0%

13/12/05 22:58:44 INFO mapred.JobClient: map 100% reduce 86%

13/12/05 22:58:50 INFO mapred.JobClient: map 100% reduce 100%

13/12/05 22:58:55 INFO mapred.JobClient: Job complete: job_201312052240_0005

13/12/05 22:58:55 INFO mapred.JobClient: Counters: 30

13/12/05 22:58:55 INFO mapred.JobClient: Job Counters

13/12/05 22:58:55 INFO mapred.JobClient: Launched reduce tasks=1

13/12/05 22:58:55 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=39357

13/12/05 22:58:55 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

13/12/05 22:58:55 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

13/12/05 22:58:55 INFO mapred.JobClient: Launched map tasks=2

13/12/05 22:58:55 INFO mapred.JobClient: Data-local map tasks=2

13/12/05 22:58:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=19826

13/12/05 22:58:55 INFO mapred.JobClient: File Input Format Counters

13/12/05 22:58:55 INFO mapred.JobClient: Bytes Read=21476129

13/12/05 22:58:55 INFO mapred.JobClient: File Output Format Counters

13/12/05 22:58:55 INFO mapred.JobClient: Bytes Written=8911036

13/12/05 22:58:55 INFO mapred.JobClient: FileSystemCounters

13/12/05 22:58:55 INFO mapred.JobClient: FILE_BYTES_READ=40407036

13/12/05 22:58:55 INFO mapred.JobClient: HDFS_BYTES_READ=39777374

13/12/05 22:58:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=60678227

13/12/05 22:58:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=8911036

13/12/05 22:58:55 INFO mapred.JobClient: Map-Reduce Framework

13/12/05 22:58:55 INFO mapred.JobClient: Map output materialized bytes=20203515

13/12/05 22:58:55 INFO mapred.JobClient: Map input records=662471

13/12/05 22:58:55 INFO mapred.JobClient: Reduce shuffle bytes=10183173

13/12/05 22:58:55 INFO mapred.JobClient: Spilled Records=1987413

13/12/05 22:58:55 INFO mapred.JobClient: Map output bytes=18878543

13/12/05 22:58:55 INFO mapred.JobClient: Total committed heap usage (bytes)=362643456

13/12/05 22:58:55 INFO mapred.JobClient: CPU time spent (ms)=18980

13/12/05 22:58:55 INFO mapred.JobClient: Map input bytes=21472832

13/12/05 22:58:55 INFO mapred.JobClient: SPLIT_RAW_BYTES=306

13/12/05 22:58:55 INFO mapred.JobClient: Combine input records=0

13/12/05 22:58:55 INFO mapred.JobClient: Reduce input records=662471

13/12/05 22:58:55 INFO mapred.JobClient: Reduce input groups=64

13/12/05 22:58:55 INFO mapred.JobClient: Combine output records=0

13/12/05 22:58:55 INFO mapred.JobClient: Physical memory (bytes) snapshot=462606336

13/12/05 22:58:55 INFO mapred.JobClient: Reduce output records=662471

13/12/05 22:58:55 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1139138560

13/12/05 22:58:55 INFO mapred.JobClient: Map output records=662471

13/12/05 22:58:55 INFO mapreduce.TransEdgeMR: ========================= Done ===============================

13/12/05 22:58:55 INFO linkgraph.NormalizeGraphIds: ========== Done normalizing graph ============

13/12/05 22:58:55 INFO linkgraph.LinkGraphEnd2End: Normalize graph finished in : 211 seconds

13/12/05 22:58:55 INFO linkgraph.PartitionGraph: ========== Partitioning Graph ============

13/12/05 22:58:58 INFO edge.EdgeIngressMR: ===== Job: Partition edges and create vertex records =========

13/12/05 22:58:58 INFO edge.EdgeIngressMR: input: /user/en-wiki-articles-output/graph_norm/vdata,/user/en-wiki-articles-output/graph_norm/edata

13/12/05 22:58:58 INFO edge.EdgeIngressMR: output: /user/en-wiki-articles-output/graph_partitioned/edges

13/12/05 22:58:58 INFO edge.EdgeIngressMR: numProc = 3

13/12/05 22:58:58 INFO edge.EdgeIngressMR: subpartPerPartition = 8

13/12/05 22:58:58 INFO edge.EdgeIngressMR: keyclass = generatedclass.MyIngressJobKey0

13/12/05 22:58:58 INFO edge.EdgeIngressMR: valclass = generatedclass.MyIngressJobVal0

13/12/05 22:58:58 INFO edge.EdgeIngressMR: ingress = constrainedrandom

13/12/05 22:58:58 INFO edge.EdgeIngressMR: gzip = false

13/12/05 22:58:58 INFO edge.EdgeIngressMR: ===============================================================

13/12/05 22:58:58 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/12/05 22:58:59 INFO mapred.FileInputFormat: Total input paths to process : 2

13/12/05 22:58:59 INFO mapred.JobClient: Running job: job_201312052240_0006

13/12/05 22:59:00 INFO mapred.JobClient: map 0% reduce 0%

13/12/05 22:59:18 INFO mapred.JobClient: map 34% reduce 0%

13/12/05 22:59:24 INFO mapred.JobClient: map 37% reduce 0%

13/12/05 22:59:27 INFO mapred.JobClient: map 38% reduce 0%

13/12/05 22:59:36 INFO mapred.JobClient: map 41% reduce 0%

13/12/05 22:59:39 INFO mapred.JobClient: map 43% reduce 0%

13/12/05 22:59:42 INFO mapred.JobClient: map 47% reduce 0%

13/12/05 22:59:51 INFO mapred.JobClient: map 49% reduce 0%

13/12/05 22:59:54 INFO mapred.JobClient: map 53% reduce 0%

13/12/05 22:59:57 INFO mapred.JobClient: map 54% reduce 0%

13/12/05 23:00:09 INFO mapred.JobClient: map 55% reduce 0%

13/12/05 23:00:12 INFO mapred.JobClient: map 57% reduce 0%

13/12/05 23:00:15 INFO mapred.JobClient: map 64% reduce 0%

13/12/05 23:00:19 INFO mapred.JobClient: map 72% reduce 11%

13/12/05 23:00:22 INFO mapred.JobClient: map 76% reduce 11%

13/12/05 23:00:25 INFO mapred.JobClient: map 77% reduce 11%

13/12/05 23:00:31 INFO mapred.JobClient: map 78% reduce 11%

13/12/05 23:00:34 INFO mapred.JobClient: map 84% reduce 11%

13/12/05 23:00:37 INFO mapred.JobClient: map 91% reduce 11%

13/12/05 23:00:40 INFO mapred.JobClient: map 95% reduce 11%

13/12/05 23:00:43 INFO mapred.JobClient: map 97% reduce 11%

13/12/05 23:00:55 INFO mapred.JobClient: map 99% reduce 11%

13/12/05 23:00:58 INFO mapred.JobClient: map 100% reduce 11%

13/12/05 23:02:29 INFO mapred.JobClient: map 100% reduce 22%

13/12/05 23:02:35 INFO mapred.JobClient: map 100% reduce 76%

13/12/05 23:02:38 INFO mapred.JobClient: map 100% reduce 80%

13/12/05 23:02:41 INFO mapred.JobClient: map 100% reduce 84%

13/12/05 23:02:44 INFO mapred.JobClient: map 100% reduce 88%

13/12/05 23:02:47 INFO mapred.JobClient: map 100% reduce 92%

13/12/05 23:02:50 INFO mapred.JobClient: map 100% reduce 97%

13/12/05 23:02:56 INFO mapred.JobClient: map 100% reduce 100%

13/12/05 23:03:02 INFO mapred.JobClient: Job complete: job_201312052240_0006

13/12/05 23:03:02 INFO mapred.JobClient: Counters: 30

13/12/05 23:03:02 INFO mapred.JobClient: Job Counters

13/12/05 23:03:02 INFO mapred.JobClient: Launched reduce tasks=1

13/12/05 23:03:02 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=379790

13/12/05 23:03:02 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

13/12/05 23:03:02 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

13/12/05 23:03:02 INFO mapred.JobClient: Launched map tasks=3

13/12/05 23:03:02 INFO mapred.JobClient: Data-local map tasks=3

13/12/05 23:03:02 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=182772

13/12/05 23:03:02 INFO mapred.JobClient: File Input Format Counters

13/12/05 23:03:02 INFO mapred.JobClient: Bytes Read=12041935

13/12/05 23:03:02 INFO mapred.JobClient: File Output Format Counters

13/12/05 23:03:02 INFO mapred.JobClient: Bytes Written=27761437

13/12/05 23:03:02 INFO mapred.JobClient: FileSystemCounters

13/12/05 23:03:02 INFO mapred.JobClient: FILE_BYTES_READ=64995481

13/12/05 23:03:02 INFO mapred.JobClient: HDFS_BYTES_READ=12042346

13/12/05 23:03:02 INFO mapred.JobClient: FILE_BYTES_WRITTEN=96983574

13/12/05 23:03:02 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=27761437

13/12/05 23:03:02 INFO mapred.JobClient: Map-Reduce Framework

13/12/05 23:03:02 INFO mapred.JobClient: Map output materialized bytes=31893712

13/12/05 23:03:02 INFO mapred.JobClient: Map input records=1022660

13/12/05 23:03:02 INFO mapred.JobClient: Reduce shuffle bytes=31893712

13/12/05 23:03:02 INFO mapred.JobClient: Spilled Records=2210272

13/12/05 23:03:02 INFO mapred.JobClient: Map output bytes=67430035

13/12/05 23:03:02 INFO mapred.JobClient: Total committed heap usage (bytes)=633384960

13/12/05 23:03:02 INFO mapred.JobClient: CPU time spent (ms)=110260

13/12/05 23:03:02 INFO mapred.JobClient: Map input bytes=12041627

13/12/05 23:03:02 INFO mapred.JobClient: SPLIT_RAW_BYTES=411

13/12/05 23:03:02 INFO mapred.JobClient: Combine input records=2747113

13/12/05 23:03:02 INFO mapred.JobClient: Reduce input records=725283

13/12/05 23:03:02 INFO mapred.JobClient: Reduce input groups=360192

13/12/05 23:03:02 INFO mapred.JobClient: Combine output records=1124797

13/12/05 23:03:02 INFO mapred.JobClient: Physical memory (bytes) snapshot=777744384

13/12/05 23:03:02 INFO mapred.JobClient: Reduce output records=360189

13/12/05 23:03:02 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1516822528

13/12/05 23:03:02 INFO mapred.JobClient: Map output records=2347602

13/12/05 23:03:02 INFO edge.EdgeIngressMR: ================== Done ====================================

13/12/05 23:03:02 INFO vrecord.VrecordIngressMR: ====== Job: Distributed Vertex Records to partitions =========

13/12/05 23:03:02 INFO vrecord.VrecordIngressMR: input: /user/en-wiki-articles-output/graph_partitioned/edges/vrecord

13/12/05 23:03:02 INFO vrecord.VrecordIngressMR: output: /user/en-wiki-articles-output/graph_partitioned/vrecords

13/12/05 23:03:02 INFO vrecord.VrecordIngressMR: numProc = 3

13/12/05 23:03:02 INFO vrecord.VrecordIngressMR: gzip = false

13/12/05 23:03:02 INFO vrecord.VrecordIngressMR: ==============================================================

13/12/05 23:03:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/12/05 23:03:03 INFO mapred.FileInputFormat: Total input paths to process : 1

13/12/05 23:03:03 INFO mapred.JobClient: Running job: job_201312052240_0007

13/12/05 23:03:04 INFO mapred.JobClient: map 0% reduce 0%

13/12/05 23:03:27 INFO mapred.JobClient: map 100% reduce 0%

13/12/05 23:03:45 INFO mapred.JobClient: map 100% reduce 66%

13/12/05 23:03:48 INFO mapred.JobClient: map 100% reduce 77%

13/12/05 23:03:54 INFO mapred.JobClient: map 100% reduce 88%

13/12/05 23:04:03 INFO mapred.JobClient: map 100% reduce 100%

13/12/05 23:04:08 INFO mapred.JobClient: Job complete: job_201312052240_0007

13/12/05 23:04:08 INFO mapred.JobClient: Counters: 32

13/12/05 23:04:08 INFO mapred.JobClient: Job Counters

13/12/05 23:04:08 INFO mapred.JobClient: Launched reduce tasks=1

13/12/05 23:04:08 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=33341

13/12/05 23:04:08 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

13/12/05 23:04:08 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

13/12/05 23:04:08 INFO mapred.JobClient: Launched map tasks=2

13/12/05 23:04:08 INFO mapred.JobClient: Data-local map tasks=2

13/12/05 23:04:08 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=32004

13/12/05 23:04:08 INFO mapred.JobClient: File Input Format Counters

13/12/05 23:04:08 INFO mapred.JobClient: Bytes Read=27762064

13/12/05 23:04:08 INFO mapred.JobClient: File Output Format Counters

13/12/05 23:04:08 INFO mapred.JobClient: Bytes Written=35581968

13/12/05 23:04:08 INFO mapred.JobClient: FileSystemCounters

13/12/05 23:04:08 INFO mapred.JobClient: FILE_BYTES_READ=38336865

13/12/05 23:04:08 INFO mapred.JobClient: HDFS_BYTES_READ=27762368

13/12/05 23:04:08 INFO mapred.JobClient: FILE_BYTES_WRITTEN=76739897

13/12/05 23:04:08 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=35581968

13/12/05 23:04:08 INFO mapred.JobClient: com.intel.hadoop.graphbuilder.partition.mapreduce.vrecord.VrecordIngressReducer$COUNTER

13/12/05 23:04:08 INFO mapred.JobClient: OWN_VERTICES=360189

13/12/05 23:04:08 INFO mapred.JobClient: VERTICES=459172

13/12/05 23:04:08 INFO mapred.JobClient: Map-Reduce Framework

13/12/05 23:04:08 INFO mapred.JobClient: Map output materialized bytes=38336871

13/12/05 23:04:08 INFO mapred.JobClient: Map input records=360189

13/12/05 23:04:08 INFO mapred.JobClient: Reduce shuffle bytes=19050294

13/12/05 23:04:08 INFO mapred.JobClient: Spilled Records=918344

13/12/05 23:04:08 INFO mapred.JobClient: Map output bytes=37418515

13/12/05 23:04:08 INFO mapred.JobClient: Total committed heap usage (bytes)=390500352

13/12/05 23:04:08 INFO mapred.JobClient: CPU time spent (ms)=22210

13/12/05 23:04:08 INFO mapred.JobClient: Map input bytes=27761437

13/12/05 23:04:08 INFO mapred.JobClient: SPLIT_RAW_BYTES=304

13/12/05 23:04:08 INFO mapred.JobClient: Combine input records=0

13/12/05 23:04:08 INFO mapred.JobClient: Reduce input records=459172

13/12/05 23:04:08 INFO mapred.JobClient: Reduce input groups=3

13/12/05 23:04:08 INFO mapred.JobClient: Combine output records=0

13/12/05 23:04:08 INFO mapred.JobClient: Physical memory (bytes) snapshot=471478272

13/12/05 23:04:08 INFO mapred.JobClient: Reduce output records=459175

13/12/05 23:04:08 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1139294208

13/12/05 23:04:08 INFO mapred.JobClient: Map output records=459172

13/12/05 23:04:08 INFO vrecord.VrecordIngressMR: ==========================Done===============================

13/12/05 23:04:08 INFO linkgraph.PartitionGraph: ========== Done partitioning graph ============

13/12/05 23:04:08 INFO linkgraph.LinkGraphEnd2End: Partition graph finished in : 312 seconds

13/12/05 23:04:08 INFO linkgraph.LinkGraphEnd2End: Total flow time : 723 seconds

job完成后,生成了一个文件。

[grid@localhost ~]$ hadoop dfs -ls /user/en-wiki-articles-output

Warning: $HADOOP_HOME is deprecated.

Found 3 items

drwxr-xr-x - grid supergroup 0 2013-12-05 22:58 /user/en-wiki-articles-output/graph_norm

drwxr-xr-x - grid supergroup 0 2013-12-05 23:03 /user/en-wiki-articles-output/graph_partitioned

drwxr-xr-x - grid supergroup 0 2013-12-05 22:55 /user/en-wiki-articles-output/graph_raw

具体这个如何可视化。继续学习中。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: