kettle spoon excel
2016-07-08 15:31
183 查看
使用kettle spoon etl 工具将oracle中大表抽取到excel中
因为数据库表中有clob字段,整个表数据量500万左右,表大小有5.3G,
根据业务需求将数据准备到另外新的表中,create table * as select ***。这样kettle查询速度快点,避免一堆关联查询。
然后采用本地kettle 编辑好脚本,测试发现本地抽取到excel效率只有40条/秒
果断放到服务器上面跑kettle的脚本。喜由天降,每分钟4w条的速度还是可以接受的。160w数据轻松导出!
但是有遇到一些问题,数据导出超过65536条就报下面错误:
[root@hz_gg57 dmb]# sh run_nielsen.sh
run_nielsen.sh: line 1: #!/bin/sh: No such file or directory
2016/07/08 12:54:54 - Kitchen - Start of run.
2016/07/08 12:54:54 - RepositoriesMeta - Reading repositories XML file: /home/data-integration/.kettle/repositories.xml
2016/07/08 12:54:57 - JOB_export_nielsen_info - Start of job execution
2016/07/08 12:54:57 - JOB_export_nielsen_info - Starting entry [export_nielsen_info]
2016/07/08 12:54:58 - Trans_export_nielsen_info - Dispatching started for transformation [Trans_export_nielsen_info]
2016/07/08 12:55:38 - 获取nielsen_info数据.0 - linenr 50000
2016/07/08 12:55:38 - Excel输出.0 - linenr 50000
2016/07/08 12:55:50 - Excel输出.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Error writing field (0,65536) : jxl.write.biff.RowsExceededException: The maximum number of rows permitted on a worksheet been exceeded
2016/07/08 12:55:50 - Excel输出.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : jxl.write.biff.RowsExceededException: The maximum number of rows permitted on a worksheet been exceeded
2016/07/08 12:55:50 - Excel输出.0 - at jxl.write.biff.WritableSheetImpl.getRowRecord(WritableSheetImpl.java:1214)
2016/07/08 12:55:50 - Excel输出.0 - at jxl.write.biff.WritableSheetImpl.addCell(WritableSheetImpl.java:1151)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.steps.exceloutput.ExcelOutput.writeField(ExcelOutput.java:391)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.steps.exceloutput.ExcelOutput.writeField(ExcelOutput.java:284)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.steps.exceloutput.ExcelOutput.writeRowToFile(ExcelOutput.java:206)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.steps.exceloutput.ExcelOutput.processRow(ExcelOutput.java:147)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2016/07/08 12:55:50 - Excel输出.0 - at java.lang.Thread.run(Thread.java:745)
2016/07/08 12:55:50 - 获取nielsen_info数据.0 - Finished reading query, closing connection.
2016/07/08 12:55:50 - Trans_export_nielsen_info - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Errors detected!
2016/07/08 12:55:50 - 获取nielsen_info数据.0 - Finished processing (I=65642, O=0, R=0, W=65640, U=0, E=0)
2016/07/08 12:55:56 - Excel输出.0 - Finished processing (I=0, O=65535, R=65536, W=65535, U=0, E=1)
2016/07/08 12:55:56 - Trans_export_nielsen_info - Transformation detected one or more steps with errors.
2016/07/08 12:55:56 - Trans_export_nielsen_info - Transformation is killing the other steps!
2016/07/08 12:55:56 - Trans_export_nielsen_info - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Errors detected!
2016/07/08 12:55:56 - JOB_export_nielsen_info - Finished job entry [export_nielsen_info] (result=[false])
2016/07/08 12:55:56 - JOB_export_nielsen_info - Job execution finished
2016/07/08 12:55:56 - Kitchen - Finished!
2016/07/08 12:55:56 - Kitchen - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Finished with errors
2016/07/08 12:55:56 - Kitchen - Start=2016/07/08 12:54:54.136, Stop=2016/07/08 12:55:56.806
2016/07/08 12:55:56 - Kitchen - Processing ended after 1 minutes and 2 seconds (62 seconds total).
经过网上查询说是2003excel版本每个sheet只能支持65536行。
之后在kettle输出控件选为microsoft excel
扩展名选为2007版本的;并且勾上Stream XSLX DATA 选项,支持流的形式写文件,不然大数量的时候写入不了excel,报内存溢出。
内存溢出GC啦!
[root@hz_gg57 dmb]# sh run_nielsen.sh
run_nielsen.sh: line 1: ?#!/bin/sh: No such file or directory
2016/07/08 13:23:16 - Kitchen - Start of run.
2016/07/08 13:23:16 - RepositoriesMeta - Reading repositories XML file: /home/data-integration/.kettle/repositories.xml
2016/07/08 13:23:29 - JOB_export_nielsen_info - Start of job execution
2016/07/08 13:23:29 - JOB_export_nielsen_info - Starting entry [export_nielsen_info]
2016/07/08 13:23:29 - Trans_export_nielsen_info - Dispatching started for transformation [Trans_export_nielsen_info]
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : UnexpectedError:
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : java.lang.OutOfMemoryError: GC overhead limit exceeded
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.resize(Saver.java:1695)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.preEmit(Saver.java:1300)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.emit(Saver.java:1187)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.emitElement(Saver.java:959)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver.processElement(Saver.java:476)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver.process(Saver.java:307)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.saveToString(Saver.java:1859)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Cursor._xmlText(Cursor.java:546)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Cursor.xmlText(Cursor.java:2436)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.values.XmlObjectBase.xmlText(XmlObjectBase.java:1489)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.poi.xssf.model.SharedStringsTable.getKey(SharedStringsTable.java:130)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.poi.xssf.model.SharedStringsTable.addEntry(SharedStringsTable.java:176)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.poi.xssf.usermodel.XSSFCell.setCellValue(XSSFCell.java:345)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.poi.xssf.usermodel.XSSFCell.setCellValue(XSSFCell.java:315)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.pentaho.di.trans.steps.excelwriter.ExcelWriterStep.writeField(ExcelWriterStep.java:514)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.pentaho.di.trans.steps.excelwriter.ExcelWriterStep.writeNextLine(ExcelWriterStep.java:300)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.pentaho.di.trans.steps.excelwriter.ExcelWriterStep.processRow(ExcelWriterStep.java:163)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at java.lang.Thread.run(Thread.java:745)
child index = 2, logging object : org.pentaho.di.core.logging.LoggingObject@6ec0657a parent=3920a2a8-6de3-43f9-88b1-1214e926bfa3
2016/07/08 13:25:17 - Trans_export_nielsen_info - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Errors detected!
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - Finished processing (I=0, O=26908, R=26908, W=26907, U=0, E=1)
2016/07/08 13:25:17 - 获取nielsen_info数据.0 - Finished reading query, closing connection.
2016/07/08 13:25:17 - Trans_export_nielsen_info - Transformation detected one or more steps with errors.
2016/07/08 13:25:17 - Trans_export_nielsen_info - Transformation is killing the other steps!
2016/07/08 13:25:17 - 获取nielsen_info数据.0 - Finished processing (I=27012, O=0, R=0, W=27010, U=0, E=0)
2016/07/08 13:25:17 - Trans_export_nielsen_info - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Errors detected!
2016/07/08 13:25:17 - JOB_export_nielsen_info - Finished job entry [export_nielsen_info] (result=[false])
2016/07/08 13:25:17 - JOB_export_nielsen_info - Job execution finished
2016/07/08 13:25:17 - Kitchen - Finished!
2016/07/08 13:25:17 - Kitchen - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Finished with errors
2016/07/08 13:25:17 - Kitchen - Start=2016/07/08 13:23:16.423, Stop=2016/07/08 13:25:17.143
2016/07/08 13:25:17 - Kitchen - Processing ended after 2 minutes and 0 seconds (120 seconds total).
因为数据库表中有clob字段,整个表数据量500万左右,表大小有5.3G,
根据业务需求将数据准备到另外新的表中,create table * as select ***。这样kettle查询速度快点,避免一堆关联查询。
然后采用本地kettle 编辑好脚本,测试发现本地抽取到excel效率只有40条/秒
果断放到服务器上面跑kettle的脚本。喜由天降,每分钟4w条的速度还是可以接受的。160w数据轻松导出!
但是有遇到一些问题,数据导出超过65536条就报下面错误:
[root@hz_gg57 dmb]# sh run_nielsen.sh
run_nielsen.sh: line 1: #!/bin/sh: No such file or directory
2016/07/08 12:54:54 - Kitchen - Start of run.
2016/07/08 12:54:54 - RepositoriesMeta - Reading repositories XML file: /home/data-integration/.kettle/repositories.xml
2016/07/08 12:54:57 - JOB_export_nielsen_info - Start of job execution
2016/07/08 12:54:57 - JOB_export_nielsen_info - Starting entry [export_nielsen_info]
2016/07/08 12:54:58 - Trans_export_nielsen_info - Dispatching started for transformation [Trans_export_nielsen_info]
2016/07/08 12:55:38 - 获取nielsen_info数据.0 - linenr 50000
2016/07/08 12:55:38 - Excel输出.0 - linenr 50000
2016/07/08 12:55:50 - Excel输出.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Error writing field (0,65536) : jxl.write.biff.RowsExceededException: The maximum number of rows permitted on a worksheet been exceeded
2016/07/08 12:55:50 - Excel输出.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : jxl.write.biff.RowsExceededException: The maximum number of rows permitted on a worksheet been exceeded
2016/07/08 12:55:50 - Excel输出.0 - at jxl.write.biff.WritableSheetImpl.getRowRecord(WritableSheetImpl.java:1214)
2016/07/08 12:55:50 - Excel输出.0 - at jxl.write.biff.WritableSheetImpl.addCell(WritableSheetImpl.java:1151)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.steps.exceloutput.ExcelOutput.writeField(ExcelOutput.java:391)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.steps.exceloutput.ExcelOutput.writeField(ExcelOutput.java:284)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.steps.exceloutput.ExcelOutput.writeRowToFile(ExcelOutput.java:206)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.steps.exceloutput.ExcelOutput.processRow(ExcelOutput.java:147)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2016/07/08 12:55:50 - Excel输出.0 - at java.lang.Thread.run(Thread.java:745)
2016/07/08 12:55:50 - 获取nielsen_info数据.0 - Finished reading query, closing connection.
2016/07/08 12:55:50 - Trans_export_nielsen_info - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Errors detected!
2016/07/08 12:55:50 - 获取nielsen_info数据.0 - Finished processing (I=65642, O=0, R=0, W=65640, U=0, E=0)
2016/07/08 12:55:56 - Excel输出.0 - Finished processing (I=0, O=65535, R=65536, W=65535, U=0, E=1)
2016/07/08 12:55:56 - Trans_export_nielsen_info - Transformation detected one or more steps with errors.
2016/07/08 12:55:56 - Trans_export_nielsen_info - Transformation is killing the other steps!
2016/07/08 12:55:56 - Trans_export_nielsen_info - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Errors detected!
2016/07/08 12:55:56 - JOB_export_nielsen_info - Finished job entry [export_nielsen_info] (result=[false])
2016/07/08 12:55:56 - JOB_export_nielsen_info - Job execution finished
2016/07/08 12:55:56 - Kitchen - Finished!
2016/07/08 12:55:56 - Kitchen - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Finished with errors
2016/07/08 12:55:56 - Kitchen - Start=2016/07/08 12:54:54.136, Stop=2016/07/08 12:55:56.806
2016/07/08 12:55:56 - Kitchen - Processing ended after 1 minutes and 2 seconds (62 seconds total).
经过网上查询说是2003excel版本每个sheet只能支持65536行。
之后在kettle输出控件选为microsoft excel
扩展名选为2007版本的;并且勾上Stream XSLX DATA 选项,支持流的形式写文件,不然大数量的时候写入不了excel,报内存溢出。
内存溢出GC啦!
[root@hz_gg57 dmb]# sh run_nielsen.sh
run_nielsen.sh: line 1: ?#!/bin/sh: No such file or directory
2016/07/08 13:23:16 - Kitchen - Start of run.
2016/07/08 13:23:16 - RepositoriesMeta - Reading repositories XML file: /home/data-integration/.kettle/repositories.xml
2016/07/08 13:23:29 - JOB_export_nielsen_info - Start of job execution
2016/07/08 13:23:29 - JOB_export_nielsen_info - Starting entry [export_nielsen_info]
2016/07/08 13:23:29 - Trans_export_nielsen_info - Dispatching started for transformation [Trans_export_nielsen_info]
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : UnexpectedError:
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : java.lang.OutOfMemoryError: GC overhead limit exceeded
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.resize(Saver.java:1695)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.preEmit(Saver.java:1300)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.emit(Saver.java:1187)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.emitElement(Saver.java:959)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver.processElement(Saver.java:476)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver.process(Saver.java:307)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.saveToString(Saver.java:1859)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Cursor._xmlText(Cursor.java:546)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Cursor.xmlText(Cursor.java:2436)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.values.XmlObjectBase.xmlText(XmlObjectBase.java:1489)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.poi.xssf.model.SharedStringsTable.getKey(SharedStringsTable.java:130)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.poi.xssf.model.SharedStringsTable.addEntry(SharedStringsTable.java:176)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.poi.xssf.usermodel.XSSFCell.setCellValue(XSSFCell.java:345)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.poi.xssf.usermodel.XSSFCell.setCellValue(XSSFCell.java:315)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.pentaho.di.trans.steps.excelwriter.ExcelWriterStep.writeField(ExcelWriterStep.java:514)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.pentaho.di.trans.steps.excelwriter.ExcelWriterStep.writeNextLine(ExcelWriterStep.java:300)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.pentaho.di.trans.steps.excelwriter.ExcelWriterStep.processRow(ExcelWriterStep.java:163)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at java.lang.Thread.run(Thread.java:745)
child index = 2, logging object : org.pentaho.di.core.logging.LoggingObject@6ec0657a parent=3920a2a8-6de3-43f9-88b1-1214e926bfa3
2016/07/08 13:25:17 - Trans_export_nielsen_info - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Errors detected!
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - Finished processing (I=0, O=26908, R=26908, W=26907, U=0, E=1)
2016/07/08 13:25:17 - 获取nielsen_info数据.0 - Finished reading query, closing connection.
2016/07/08 13:25:17 - Trans_export_nielsen_info - Transformation detected one or more steps with errors.
2016/07/08 13:25:17 - Trans_export_nielsen_info - Transformation is killing the other steps!
2016/07/08 13:25:17 - 获取nielsen_info数据.0 - Finished processing (I=27012, O=0, R=0, W=27010, U=0, E=0)
2016/07/08 13:25:17 - Trans_export_nielsen_info - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Errors detected!
2016/07/08 13:25:17 - JOB_export_nielsen_info - Finished job entry [export_nielsen_info] (result=[false])
2016/07/08 13:25:17 - JOB_export_nielsen_info - Job execution finished
2016/07/08 13:25:17 - Kitchen - Finished!
2016/07/08 13:25:17 - Kitchen - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Finished with errors
2016/07/08 13:25:17 - Kitchen - Start=2016/07/08 13:23:16.423, Stop=2016/07/08 13:25:17.143
2016/07/08 13:25:17 - Kitchen - Processing ended after 2 minutes and 0 seconds (120 seconds total).
相关文章推荐
- 使用Python生成Excel格式的图片
- Excel 曝出 Power Query 安全漏洞,1.2 亿用户易受远程 DDE 攻击
- VBA将excel数据表生成JSON文件
- excel vba 限制工作表的滚动区域代码
- excel vba 高亮显示当前行代码
- C#控制Excel Sheet使其自适应页宽与列宽的方法
- SQL 导入导出Excel数据的语句
- 文本、Excel、Access数据导入SQL Server2000的方法
- C#实现Excel动态生成PivotTable
- C#基于COM方式读取Excel表格的方法
- C#导出数据到Excel文件的方法
- Vbscript生成Excel报表的常用操作总结
- C#实现导入CSV文件到Excel工作簿的方法
- C#基于NPOI生成具有精确列宽行高的Excel文件的方法
- 总提示[Microsoft][ODBC Excel Driver] 数值字段溢出官方解决方法
- C#实现DataSet内数据转化为Excel和Word文件的通用类完整实例
- C#将Sql数据保存到Excel文件中的方法
- VC6.0实现读取Excel数据的方法
- C#定制Excel界面并实现与数据库交互的方法
- 把excel表格里的数据导入sql数据库的两种方法