Trafodion数据加载性能测试
2016-12-15 09:37
155 查看
在之前写的另外一篇文章“Trafodion 数据加载介绍”中提到,Trafodion数据加载主要有以下四种方式,
Insert
Upsert
Upsert Using Load
Load
从性能上面来看,这四种加载方式是依次递增的,即Load >Upsert Using Load >Upsert >Insert。
下面我们就现场验证一下这几种数据加载方式各自的性能,我所使用的集群配置为:4个节点的Hadoop,Trafodion装在3个节点的RegionServer上,每个Node的内存为132G。现在通过以下步骤来测试性能,
数据准备—创建两张表相同结构的表,一个表3千万数据,另外一张为空表
Insert测试—由于insert语句最慢,我们通过qid拿到当前insert语句的统计信息发现,在11分45秒的时候约插入20万条记录,这意味着,每秒插入数据约300条。如果3千万全部插入,则约需要耗时27个小时。
Upsert测试—同样使用上述方法,拿到upsert执行语句的统计信息,发现在6分钟的时候已经插入了将近9百万的数据了,以此估算每秒约插入数据约2.5万条,全部插入约需20分钟。
Upsert Using Load测试,从下面的结果可以看出,导入3千万的数据花了12分钟,这比upsert语句要快将近一倍,相当于每秒4万多条记录。
Load测试,通过以下结果可以发现,load最快,从统计信息可以看出它走的是hbase bulk load,总用时花了42秒,相当于每秒70万条记录,这远远超过了上述三种加载方式。
Insert
Upsert
Upsert Using Load
Load
从性能上面来看,这四种加载方式是依次递增的,即Load >Upsert Using Load >Upsert >Insert。
下面我们就现场验证一下这几种数据加载方式各自的性能,我所使用的集群配置为:4个节点的Hadoop,Trafodion装在3个节点的RegionServer上,每个Node的内存为132G。现在通过以下步骤来测试性能,
数据准备—创建两张表相同结构的表,一个表3千万数据,另外一张为空表
SQL>select count(*) from eboxdata_30m; (EXPR) -------------------- 30000000 --- 1 row(s) selected. SQL>create table eboxdata_30m_new like eboxdata_30m; --- SQL operation complete.
Insert测试—由于insert语句最慢,我们通过qid拿到当前insert语句的统计信息发现,在11分45秒的时候约插入20万条记录,这意味着,每秒插入数据约300条。如果3千万全部插入,则约需要耗时27个小时。
SQL>insert into eboxdata_30m_new select * from eboxdata_30m; SQL>get statistics for qid MXID11001049697212347845223497595000000000706U3333300_1609_SQL_CUR_7 default; .... Qid MXID11001049697212347845223497595000000000706U3333300_1609_SQL_CUR_7 Compile Start Time 2016/12/14 13:11:24.930565 Compile End Time 2016/12/14 13:11:26.585146 Compile Elapsed Time 0:00:01.654581 Execute Start Time 2016/12/14 13:11:26.585298 Execute End Time -1 Execute Elapsed Time 0:11:45.164579 State OPEN Rows Affected -1 SQL Error Code 0 Stats Error Code 0 Query Type SQL_INSERT_NON_UNIQUE Sub Query Type SQL_STMT_NA Estimated Accessed Rows 0 Estimated Used Rows 0 Parent Qid NONE Parent Query System NONE Child Qid NONE Number of SQL Processes 19 Number of Cpus 3 Transaction Id 4294967298 Source String insert into eboxdata_30m_new select * from eboxdata_30m SQL Source Length 55 Rows Returned 0 First Row Returned Time -1 Last Error before AQR 0 Number of AQR retries 0 Delay before AQR 0 No. of times reclaimed 0 Cancel Time -1 Last Suspend Time -1 Stats Collection Type OPERATOR_STATS LC RC Id PaId ExId Frag TDB Name DOP Dispatches Oper CPU Time Est. Records Used Act. Records Used Details 12 . 13 . 7 0 EX_ROOT 1 1 29 0 0 1174 11 . 12 13 6 0 EX_SPLIT_TOP 1 1 7 29,999,970 0 10 . 11 12 6 0 EX_SEND_TOP 6 18 1,138 29,999,970 0 9 . 10 11 6 2 EX_SEND_BOTTOM 6 12 558 29,999,970 0 8 . 9 10 6 2 EX_SPLIT_BOTTOM 6 48 1,280 29,999,970 0 261561976 6 7 8 9 5 2 EX_TUPLE_FLOW 6 3,177 160,763 29,999,970 0 . . 7 8 4 2 EX_TRAF_INSERT 6 3,171 39,525,341 1 197,967 TRAFODION.PKSAAS.EBOXDATA_30M_NEW|0|54230353 5 . 6 8 3 2 EX_SORT 6 1,571,909 149,789,880 29,999,970 200,388 704|89924|24 4 . 5 6 2 2 EX_SPLIT_TOP 6 1,568,842 4,135,959 29,999,970 29,999,833 3 . 4 5 2 2 EX_SEND_TOP 72 1,579,269 67,948,195 29,999,970 29,999,833 2 . 3 4 2 3 EX_SEND_BOTTOM 72 3,158,220 457,945,988 29,999,970 29,999,909 1 . 2 3 2 3 EX_SPLIT_BOTTOM 12 3,157,637 9,283,550 29,999,970 30,000,000 628902186 . . 1 2 1 3 EX_TRAF_KEY_SELECT 23 3,026,037 309,730,635 29,999,970 30,000,000 TRAFODION.PKSAAS.EBOXDATA_30M|30000000|7527968500 --- SQL operation complete.
Upsert测试—同样使用上述方法,拿到upsert执行语句的统计信息,发现在6分钟的时候已经插入了将近9百万的数据了,以此估算每秒约插入数据约2.5万条,全部插入约需20分钟。
SQL>control query cancel qid MXID11001049697212347845223497595000000000706U3333300_1609_SQL_CUR_7; SQL>upsert into eboxdata_30m_new select * from eboxdata_30m; SQL>get statistics for qid MXID11001049515212347845223198599000000000206U3333300_522_SQL_CUR_29 default; .... Qid MXID11001049515212347845223198599000000000206U3333300_522_SQL_CUR_29 Compile Start Time 2016/12/14 13:35:00.893745 Compile End Time 2016/12/14 13:35:07.080435 Compile Elapsed Time 0:00:06.186690 Execute Start Time 2016/12/14 13:35:07.080609 Execute End Time -1 Execute Elapsed Time 0:06:09.589123 State OPEN Rows Affected -1 SQL Error Code 0 Stats Error Code 0 Query Type SQL_INSERT_NON_UNIQUE Sub Query Type SQL_STMT_NA Estimated Accessed Rows 0 Estimated Used Rows 0 Parent Qid NONE Parent Query System NONE Child Qid NONE Number of SQL Processes 19 Number of Cpus 3 Transaction Id 4294967299 Source String upsert into eboxdata_30m_new select * from eboxdata_30m SQL Source Length 55 Rows Returned 0 First Row Returned Time -1 Last Error before AQR 0 Number of AQR retries 0 Delay before AQR 0 No. of times reclaimed 0 Cancel Time -1 Last Suspend Time -1 Stats Collection Type OPERATOR_STATS LC RC Id PaId ExId Frag TDB Name DOP Dispatches Oper CPU Time Est. Records Used Act. Records Used Details 12 . 13 . 7 0 EX_ROOT 1 1 30 0 0 1382 11 . 12 13 6 0 EX_SPLIT_TOP 1 1 31 29,999,970 0 10 . 11 12 6 0 EX_SEND_TOP 6 18 1,321 29,999,970 0 9 . 10 11 6 2 EX_SEND_BOTTOM 6 12 517 29,999,970 0 8 . 9 10 6 2 EX_SPLIT_BOTTOM 6 48 1,106 29,999,970 0 276976607 6 7 8 9 5 2 EX_TUPLE_FLOW 6 141,906 1,161,857 29,999,970 0 . . 7 8 4 2 EX_TRAF_VSBB_UPSERT 6 141,900 34,293,113 1 8,928,848 TRAFODION.PKSAAS.EBOXDATA_30M_NEW|0|2454851950 5 . 6 8 3 2 EX_SORT 6 1,702,355 167,387,942 29,999,970 8,936,375 27163|89932|24 4 . 5 6 2 2 EX_SPLIT_TOP 6 1,560,567 4,261,383 29,999,970 29,999,819 3 . 4 5 2 2 EX_SEND_TOP 72 1,579,262 69,870,689 29,999,970 29,999,819 2 . 3 4 2 3 EX_SEND_BOTTOM 72 3,158,252 460,964,057 29,999,970 29,999,895 1 . 2 3 2 3 EX_SPLIT_BOTTOM 12 3,157,635 9,503,227 29,999,970 30,000,000 635988911 . . 1 2 1 3 EX_TRAF_KEY_SELECT 23 3,026,053 317,312,896 29,999,970 30,000,000 TRAFODION.PKSAAS.EBOXDATA_30M|30000000|7527968500 --- SQL operation complete.
Upsert Using Load测试,从下面的结果可以看出,导入3千万的数据花了12分钟,这比upsert语句要快将近一倍,相当于每秒4万多条记录。
SQL>upsert using load into eboxdata_30m_new select * from eboxdata_30m; .... --- 30000000 row(s) inserted. Start Time 2016/12/15 09:16:22.879596 End Time 2016/12/15 09:28:23.098801 Elapsed Time 00:12:00.219205 Compile Time 00:00:00.046010 Execution Time 00:12:00.172971 Table Name Records Records Hdfs Hdfs I/O Hdfs Access Accessed Used I/Os Bytes Time(usec) TRAFODION.PKSAAS.EBOXDATA_30M 30000000 30000000 3017 7527968500 9287063 TRAFODION.PKSAAS.EBOXDATA_30M_NEW 0 30000000 247956 8248464412 889703853 SQL>get statistics for qid MXID11000047548212348461421382158000000000406U3333300_1108_SQL_CUR_14 default; .... LC RC Id PaId ExId Frag TDB Name DOP Dispatches Oper CPU Time Est. Records Used Act. Records Used Details 12 . 13 . 7 0 EX_ROOT 1 1 41 0 0 1133 11 . 12 13 6 0 EX_SPLIT_TOP 1 1 6 29,999,970 0 10 . 11 12 6 0 EX_SEND_TOP 6 18 1,086 29,999,970 0 9 . 10 11 6 2 EX_SEND_BOTTOM 6 12 399 29,999,970 0 8 . 9 10 6 2 EX_SPLIT_BOTTOM 6 43 447 29,999,970 0 268009646 6 7 8 9 5 2 EX_TUPLE_FLOW 6 190,246 1,046,102 29,999,970 0 . . 7 8 4 2 EX_TRAF_VSBB_UPSERT_LOAD 6 190,238 29,003,168 1 5,751,963 TRAFODION.PKSAAS.EBOXDATA_30M_NEW|0|1581492354 5 . 6 8 3 2 EX_SORT 6 1,753,154 164,131,820 29,999,970 5,752,791 17520|89908|24 4 . 5 6 2 2 EX_SPLIT_TOP 6 1,563,018 4,278,627 29,999,970 29,999,835 3 . 4 5 2 2 EX_SEND_TOP 72 1,579,278 69,549,083 29,999,970 29,999,835 2 . 3 4 2 3 EX_SEND_BOTTOM 72 3,158,237 459,331,880 29,999,970 29,999,911 1 . 2 3 2 3 EX_SPLIT_BOTTOM 12 3,157,643 9,414,684 29,999,970 30,000,000 590013078 . . 1 2 1 3 EX_TRAF_KEY_SELECT 23 3,025,840 232,149,739 29,999,970 30,000,000 TRAFODION.PKSAAS.EBOXDATA_30M|30000000|7527968500 --- SQL operation complete.
Load测试,通过以下结果可以发现,load最快,从统计信息可以看出它走的是hbase bulk load,总用时花了42秒,相当于每秒70万条记录,这远远超过了上述三种加载方式。
SQL>load into eboxdata_30m_new select * from eboxdata_30m; UTIL_OUTPUT -------------------------------------------------------------------------------------------------------------------------------- Task: LOAD Status: Started Object: TRAFODION.PKSAAS.EBOXDATA_30M_NEW Task: CLEANUP Status: Started Object: TRAFODION.PKSAAS.EBOXDATA_30M_NEW Task: CLEANUP Status: Ended Object: TRAFODION.PKSAAS.EBOXDATA_30M_NEW Task: PREPARATION Status: Started Object: TRAFODION.PKSAAS.EBOXDATA_30M_NEW Rows Processed: 30000000 Task: PREPARATION Status: Ended ET: 00:00:39.754 Task: COMPLETION Status: Started Object: TRAFODION.PKSAAS.EBOXDATA_30M_NEW Task: COMPLETION Status: Ended ET: 00:00:01.852 --- SQL operation complete. Start Time 2016/12/15 09:33:20.473559 End Time 2016/12/15 09:34:02.888691 Elapsed Time 00:00:42.415132 Compile Time 00:00:00.010635 Execution Time 00:00:42.404151 SQL>get statistics for qid MXID11000047548212348461421382158000000000406U3333300_1248_SQL_CUR_22 default; ... LC RC Id PaId ExId Frag TDB Name DOP Dispatches Oper CPU Time Est. Records Used Act. Records Used Details 1 . 2 . 2 0 EX_ROOT 1 0 0 0 0 . . 1 2 1 0 HBASE_BULK_LOAD 1 1 0 0 0 --- SQL operation complete.
相关文章推荐
- 数据加载的性能测试
- Trafodion 性能优化之加载数据
- HFSoft.Data2.0数据持久组件性能简要测试结果
- 影响性能的测试报告(数据库版)测试源代码
- 批量数据结构缓冲加载数据二叉树查找的效率优化测试
- 使用MondrianFoodMartLoader加载foodmart测试数据
- java环境下的axis与xfire性能测试数据
- 在linux下测试memcached读取MySQL数据库数据的性能
- Sql语句与存储过程查询数据的性能测试实现代码
- Sql语句与存储过程查询数据的性能测试实现代码
- Sql Server 的性能测试数据的陷阱
- Oracle性能优化实战 修改版(改正原有测试的指标数据)
- 性能测试中批量数据制作实例的多种方法讨论
- 插入10000条数据测试DB性能
- 使用AWK构造性能测试数据
- 网络游戏之性能测试规划(二)——如何建立有效的基准数据及环境(拟稿)
- 影响性能的测试报告(数据库版)
- 提供一个oracle9i export,import dmp性能测试参考数据
- 【软件性能测试-LoadRunner实战技能 4】== 监控指标数据分析
- 浅谈性能测试数据(Data Environment)的设计