您的位置:首页 > 其它

Trafodion数据加载性能测试

2016-12-15 09:37 155 查看
在之前写的另外一篇文章“Trafodion 数据加载介绍”中提到,Trafodion数据加载主要有以下四种方式,

Insert

Upsert

Upsert Using Load

Load

从性能上面来看,这四种加载方式是依次递增的,即Load >Upsert Using Load >Upsert >Insert。

下面我们就现场验证一下这几种数据加载方式各自的性能,我所使用的集群配置为:4个节点的Hadoop,Trafodion装在3个节点的RegionServer上,每个Node的内存为132G。现在通过以下步骤来测试性能,

数据准备—创建两张表相同结构的表,一个表3千万数据,另外一张为空表

SQL>select count(*) from eboxdata_30m;

(EXPR)
--------------------
30000000

--- 1 row(s) selected.
SQL>create table eboxdata_30m_new like eboxdata_30m;

--- SQL operation complete.


Insert测试—由于insert语句最慢,我们通过qid拿到当前insert语句的统计信息发现,在11分45秒的时候约插入20万条记录,这意味着,每秒插入数据约300条。如果3千万全部插入,则约需要耗时27个小时。

SQL>insert into eboxdata_30m_new select * from eboxdata_30m;
SQL>get statistics for qid MXID11001049697212347845223497595000000000706U3333300_1609_SQL_CUR_7 default;
....

Qid                      MXID11001049697212347845223497595000000000706U3333300_1609_SQL_CUR_7
Compile Start Time       2016/12/14 13:11:24.930565
Compile End Time         2016/12/14 13:11:26.585146
Compile Elapsed Time                 0:00:01.654581
Execute Start Time       2016/12/14 13:11:26.585298
Execute End Time         -1
Execute Elapsed Time                 0:11:45.164579
State                    OPEN
Rows Affected            -1
SQL Error Code           0
Stats Error Code         0
Query Type               SQL_INSERT_NON_UNIQUE
Sub Query Type           SQL_STMT_NA
Estimated Accessed Rows  0
Estimated Used Rows      0
Parent Qid               NONE
Parent Query System      NONE
Child Qid                NONE
Number of SQL Processes  19
Number of Cpus           3
Transaction Id           4294967298
Source String            insert into eboxdata_30m_new select * from eboxdata_30m
SQL Source Length        55
Rows Returned            0
First Row Returned Time  -1
Last Error before AQR    0
Number of AQR retries    0
Delay before AQR         0
No. of times reclaimed   0
Cancel Time              -1
Last Suspend Time        -1
Stats Collection Type    OPERATOR_STATS

LC   RC   Id   PaId ExId Frag TDB Name                 DOP     Dispatches      Oper CPU Time  Est. Records Used  Act. Records Used    Details

12   .    13   .    7    0    EX_ROOT                  1                1                 29                  0                  0 1174
11   .    12   13   6    0    EX_SPLIT_TOP             1                1                  7         29,999,970                  0
10   .    11   12   6    0    EX_SEND_TOP              6               18              1,138         29,999,970                  0
9    .    10   11   6    2    EX_SEND_BOTTOM           6               12                558         29,999,970                  0
8    .    9    10   6    2    EX_SPLIT_BOTTOM          6               48              1,280         29,999,970                  0 261561976
6    7    8    9    5    2    EX_TUPLE_FLOW            6            3,177            160,763         29,999,970                  0
.    .    7    8    4    2    EX_TRAF_INSERT           6            3,171         39,525,341                  1            197,967 TRAFODION.PKSAAS.EBOXDATA_30M_NEW|0|54230353
5    .    6    8    3    2    EX_SORT                  6        1,571,909        149,789,880         29,999,970            200,388 704|89924|24
4    .    5    6    2    2    EX_SPLIT_TOP             6        1,568,842          4,135,959         29,999,970         29,999,833
3    .    4    5    2    2    EX_SEND_TOP              72       1,579,269         67,948,195         29,999,970         29,999,833
2    .    3    4    2    3    EX_SEND_BOTTOM           72       3,158,220        457,945,988         29,999,970         29,999,909
1    .    2    3    2    3    EX_SPLIT_BOTTOM          12       3,157,637          9,283,550         29,999,970         30,000,000 628902186
.    .    1    2    1    3    EX_TRAF_KEY_SELECT       23       3,026,037        309,730,635         29,999,970         30,000,000 TRAFODION.PKSAAS.EBOXDATA_30M|30000000|7527968500

--- SQL operation complete.


Upsert测试—同样使用上述方法,拿到upsert执行语句的统计信息,发现在6分钟的时候已经插入了将近9百万的数据了,以此估算每秒约插入数据约2.5万条,全部插入约需20分钟。

SQL>control query cancel qid MXID11001049697212347845223497595000000000706U3333300_1609_SQL_CUR_7;
SQL>upsert into eboxdata_30m_new select * from eboxdata_30m;
SQL>get statistics for qid MXID11001049515212347845223198599000000000206U3333300_522_SQL_CUR_29 default;
....

Qid                      MXID11001049515212347845223198599000000000206U3333300_522_SQL_CUR_29
Compile Start Time       2016/12/14 13:35:00.893745
Compile End Time         2016/12/14 13:35:07.080435
Compile Elapsed Time                 0:00:06.186690
Execute Start Time       2016/12/14 13:35:07.080609
Execute End Time         -1
Execute Elapsed Time                 0:06:09.589123
State                    OPEN
Rows Affected            -1
SQL Error Code           0
Stats Error Code         0
Query Type               SQL_INSERT_NON_UNIQUE
Sub Query Type           SQL_STMT_NA
Estimated Accessed Rows  0
Estimated Used Rows      0
Parent Qid               NONE
Parent Query System      NONE
Child Qid                NONE
Number of SQL Processes  19
Number of Cpus           3
Transaction Id           4294967299
Source String            upsert into eboxdata_30m_new select * from eboxdata_30m
SQL Source Length        55
Rows Returned            0
First Row Returned Time  -1
Last Error before AQR    0
Number of AQR retries    0
Delay before AQR         0
No. of times reclaimed   0
Cancel Time              -1
Last Suspend Time        -1
Stats Collection Type    OPERATOR_STATS

LC   RC   Id   PaId ExId Frag TDB Name                 DOP     Dispatches      Oper CPU Time  Est. Records Used  Act. Records Used    Details

12   .    13   .    7    0    EX_ROOT                  1                1                 30                  0                  0 1382
11   .    12   13   6    0    EX_SPLIT_TOP             1                1                 31         29,999,970                  0
10   .    11   12   6    0    EX_SEND_TOP              6               18              1,321         29,999,970                  0
9    .    10   11   6    2    EX_SEND_BOTTOM           6               12                517         29,999,970                  0
8    .    9    10   6    2    EX_SPLIT_BOTTOM          6               48              1,106         29,999,970                  0 276976607
6    7    8    9    5    2    EX_TUPLE_FLOW            6          141,906          1,161,857         29,999,970                  0
.    .    7    8    4    2    EX_TRAF_VSBB_UPSERT      6          141,900         34,293,113                  1          8,928,848 TRAFODION.PKSAAS.EBOXDATA_30M_NEW|0|2454851950
5    .    6    8    3    2    EX_SORT                  6        1,702,355        167,387,942         29,999,970          8,936,375 27163|89932|24
4    .    5    6    2    2    EX_SPLIT_TOP             6        1,560,567          4,261,383         29,999,970         29,999,819
3    .    4    5    2    2    EX_SEND_TOP              72       1,579,262         69,870,689         29,999,970         29,999,819
2    .    3    4    2    3    EX_SEND_BOTTOM           72       3,158,252        460,964,057         29,999,970         29,999,895
1    .    2    3    2    3    EX_SPLIT_BOTTOM          12       3,157,635          9,503,227         29,999,970         30,000,000 635988911
.    .    1    2    1    3    EX_TRAF_KEY_SELECT       23       3,026,053        317,312,896         29,999,970         30,000,000 TRAFODION.PKSAAS.EBOXDATA_30M|30000000|7527968500

--- SQL operation complete.


Upsert Using Load测试,从下面的结果可以看出,导入3千万的数据花了12分钟,这比upsert语句要快将近一倍,相当于每秒4万多条记录。

SQL>upsert using load into eboxdata_30m_new select * from eboxdata_30m;
....

--- 30000000 row(s) inserted.

Start Time             2016/12/15 09:16:22.879596
End Time               2016/12/15 09:28:23.098801
Elapsed Time                      00:12:00.219205
Compile Time                      00:00:00.046010
Execution Time                    00:12:00.172971

Table Name             Records        Records      Hdfs       Hdfs I/O    Hdfs Access
Accessed           Used      I/Os          Bytes     Time(usec)
TRAFODION.PKSAAS.EBOXDATA_30M
30000000       30000000      3017     7527968500        9287063
TRAFODION.PKSAAS.EBOXDATA_30M_NEW
0       30000000    247956     8248464412      889703853

SQL>get statistics for qid MXID11000047548212348461421382158000000000406U3333300_1108_SQL_CUR_14 default;
....

LC   RC   Id   PaId ExId Frag TDB Name                 DOP     Dispatches      Oper CPU Time  Est. Records Used  Act. Records Used    Details

12   .    13   .    7    0    EX_ROOT                  1                1                 41                  0                  0 1133
11   .    12   13   6    0    EX_SPLIT_TOP             1                1                  6         29,999,970                  0
10   .    11   12   6    0    EX_SEND_TOP              6               18              1,086         29,999,970                  0
9    .    10   11   6    2    EX_SEND_BOTTOM           6               12                399         29,999,970                  0
8    .    9    10   6    2    EX_SPLIT_BOTTOM          6               43                447         29,999,970                  0 268009646
6    7    8    9    5    2    EX_TUPLE_FLOW            6          190,246          1,046,102         29,999,970                  0
.    .    7    8    4    2    EX_TRAF_VSBB_UPSERT_LOAD 6          190,238         29,003,168                  1          5,751,963 TRAFODION.PKSAAS.EBOXDATA_30M_NEW|0|1581492354
5    .    6    8    3    2    EX_SORT                  6        1,753,154        164,131,820         29,999,970          5,752,791 17520|89908|24
4    .    5    6    2    2    EX_SPLIT_TOP             6        1,563,018          4,278,627         29,999,970         29,999,835
3    .    4    5    2    2    EX_SEND_TOP              72       1,579,278         69,549,083         29,999,970         29,999,835
2    .    3    4    2    3    EX_SEND_BOTTOM           72       3,158,237        459,331,880         29,999,970         29,999,911
1    .    2    3    2    3    EX_SPLIT_BOTTOM          12       3,157,643          9,414,684         29,999,970         30,000,000 590013078
.    .    1    2    1    3    EX_TRAF_KEY_SELECT       23       3,025,840        232,149,739         29,999,970         30,000,000 TRAFODION.PKSAAS.EBOXDATA_30M|30000000|7527968500

--- SQL operation complete.


Load测试,通过以下结果可以发现,load最快,从统计信息可以看出它走的是hbase bulk load,总用时花了42秒,相当于每秒70万条记录,这远远超过了上述三种加载方式。

SQL>load into eboxdata_30m_new select * from eboxdata_30m;

UTIL_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
Task: LOAD             Status: Started    Object: TRAFODION.PKSAAS.EBOXDATA_30M_NEW
Task:  CLEANUP         Status: Started    Object: TRAFODION.PKSAAS.EBOXDATA_30M_NEW
Task:  CLEANUP         Status: Ended      Object: TRAFODION.PKSAAS.EBOXDATA_30M_NEW
Task:  PREPARATION     Status: Started    Object: TRAFODION.PKSAAS.EBOXDATA_30M_NEW
Rows Processed: 30000000
Task:  PREPARATION     Status: Ended      ET: 00:00:39.754
Task:  COMPLETION      Status: Started    Object: TRAFODION.PKSAAS.EBOXDATA_30M_NEW
Task:  COMPLETION      Status: Ended      ET: 00:00:01.852

--- SQL operation complete.

Start Time             2016/12/15 09:33:20.473559
End Time               2016/12/15 09:34:02.888691
Elapsed Time                      00:00:42.415132
Compile Time                      00:00:00.010635
Execution Time                    00:00:42.404151

SQL>get statistics for qid MXID11000047548212348461421382158000000000406U3333300_1248_SQL_CUR_22 default;
...

LC   RC   Id   PaId ExId Frag TDB Name                 DOP     Dispatches      Oper CPU Time  Est. Records Used  Act. Records Used    Details

1    .    2    .    2    0    EX_ROOT                  1                0                  0                  0                  0
.    .    1    2    1    0    HBASE_BULK_LOAD          1                1                  0                  0                  0

--- SQL operation complete.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: