poi大数据导入解决方法
2016-03-03 15:59
393 查看
This one comes up quite a lot, but often the reason isn't what you might initially think. So, the first thing to check is - what's the source of the problem? Your file? Your code? Your environment? Or Apache POI?
(If you're here, you probably think it's Apache POI. However, it often isn't! A moderate laptop, with a decent but not excessive heap size, from a standing start, can normally read or write a file with 100 columns and 100,000 rows in under a couple of seconds, including the time to start the JVM).
Apache POI ships with a few programs and a few example programs, which can be used to do some basic performance checks. For testing file generation, the class to use is in the examples package, SSPerformanceTest. Run SSPerformanceTest with arguments of the writing type (HSSF, XSSF or SXSSF), the number rows, the number of columns, and if the file should be saved. If you can't run that with 50,000 rows and 50 columns in HSSF and SXSSF in under 3 seconds, and XSSF in under 10 seconds (and ideally all 3 in less than that!), then the problem is with your environment.
Next, use the example program ToCSV to try reading the a file in with HSSF or XSSF. Related is XLSX2CSV, which uses SAX parsing for .xlsx. Run this against both your problem file, and a simple one generated by SSPerformanceTest of the same size. If this is slow, then there could be an Apache POI problem with how the file is being processed (POI makes some assumptions that might not always be right on all files). If these tests are fast, then any performance problems are in your code!
(If you're here, you probably think it's Apache POI. However, it often isn't! A moderate laptop, with a decent but not excessive heap size, from a standing start, can normally read or write a file with 100 columns and 100,000 rows in under a couple of seconds, including the time to start the JVM).
Apache POI ships with a few programs and a few example programs, which can be used to do some basic performance checks. For testing file generation, the class to use is in the examples package, SSPerformanceTest. Run SSPerformanceTest with arguments of the writing type (HSSF, XSSF or SXSSF), the number rows, the number of columns, and if the file should be saved. If you can't run that with 50,000 rows and 50 columns in HSSF and SXSSF in under 3 seconds, and XSSF in under 10 seconds (and ideally all 3 in less than that!), then the problem is with your environment.
Next, use the example program ToCSV to try reading the a file in with HSSF or XSSF. Related is XLSX2CSV, which uses SAX parsing for .xlsx. Run this against both your problem file, and a simple one generated by SSPerformanceTest of the same size. If this is slow, then there could be an Apache POI problem with how the file is being processed (POI makes some assumptions that might not always be right on all files). If these tests are fast, then any performance problems are in your code!
相关文章推荐
- AIR客户端-高效处理图片缩略图的解决思路1
- scala 实现自定义排序算法
- VR开发中性能问题—OculusWaitForGPU
- VR开发中性能问题—OculusWaitForGPU
- 【云计算】Netflix 开源持续交付平台 Spinnaker
- 1090. Highest Price in Supply Chain (25)
- codeforces 627B B. Factory Repairs(线段树)
- C# 使用SqlBulkCopy类批量复制大数据 快速导入Excel大量数据
- 实时流Streaming大数据:Storm,Spark和Samza
- data Mining with Weka: Trailer More Data Mining with Weka 用weka 进行数据挖掘 Weka 用weka 进行更多数据挖掘
- 永远不要在循环之外调用wait方法
- 永远不要在循环之外调用wait方法
- DLL中dllmain重定义的解决办法
- Fatal: the Postfix mail system is already running 的解决方案
- List对于自定义类型,使用contains
- Email 下载音频附件完成后播放不显示名称
- wait_event_interruptible 使用方法
- 如何: 如何提供自己 DllMain MFC 的规则 DLL 中
- 【转】traits技术及模板偏特化
- 运行目录和工作目录 http://blog.csdn.net/ghevinn/article/details/17399001