java操作office和pdf文件:读取word,excel和pdf文档内容
2015-12-15 13:44
1081 查看
引用POI包读取word文档内容
poi.jar 下载地址
http://apache.freelamp.com/poi/release/bin/poi-bin-3.6-20091214.zip
http://apache.etoak.com/poi/release/bin/poi-bin-3.6-20091214.zip
http://labs.renren.com/apache-mirror/poi/release/bin/poi-bin-3.6-20091214.zip
2.引用jxl包读取excel文档的内容
Jxl.jar下载地址
http://nchc.dl.sourceforge.net/project/jexcelapi/CSharpJExcel/CSharpJExcel.zip
3.引用PDFBox读取pdf文档的内容
Pdfbox.jar下载地址
http://labs.renren.com/apache-mirror/pdfbox/1.1.0/pdfbox-1.1.0.jar
http://apache.etoak.com/pdfbox/1.1.0/pdfbox-1.1.0.jar
http://apache.freelamp.com/pdfbox/1.1.0/pdfbox-1.1.0.jar
Fontbox.jar下载地址
http://apache.etoak.com/pdfbox/1.1.0/fontbox-1.1.0.jar
http://labs.renren.com/apache-mirror/pdfbox/1.1.0/fontbox-1.1.0.jar
http://apache.freelamp.com/pdfbox/1.1.0/fontbox-1.1.0.jar
Jempbox.jar下载地址
http://labs.renren.com/apache-mirror/pdfbox/1.1.0/jempbox-1.1.0.jar
http://apache.etoak.com/pdfbox/1.1.0/jempbox-1.1.0.jar
http://apache.freelamp.com/pdfbox/1.1.0/jempbox-1.1.0.jar
下面我们就来简单看一下这些jar包的对文档的读取的应用实例:
pom坐标为:
读取内容代码
另外 我把获取文件的MIME方法也写上吧
版权所有,转载请标注
poi.jar 下载地址
http://apache.freelamp.com/poi/release/bin/poi-bin-3.6-20091214.zip
http://apache.etoak.com/poi/release/bin/poi-bin-3.6-20091214.zip
http://labs.renren.com/apache-mirror/poi/release/bin/poi-bin-3.6-20091214.zip
2.引用jxl包读取excel文档的内容
Jxl.jar下载地址
http://nchc.dl.sourceforge.net/project/jexcelapi/CSharpJExcel/CSharpJExcel.zip
3.引用PDFBox读取pdf文档的内容
Pdfbox.jar下载地址
http://labs.renren.com/apache-mirror/pdfbox/1.1.0/pdfbox-1.1.0.jar
http://apache.etoak.com/pdfbox/1.1.0/pdfbox-1.1.0.jar
http://apache.freelamp.com/pdfbox/1.1.0/pdfbox-1.1.0.jar
Fontbox.jar下载地址
http://apache.etoak.com/pdfbox/1.1.0/fontbox-1.1.0.jar
http://labs.renren.com/apache-mirror/pdfbox/1.1.0/fontbox-1.1.0.jar
http://apache.freelamp.com/pdfbox/1.1.0/fontbox-1.1.0.jar
Jempbox.jar下载地址
http://labs.renren.com/apache-mirror/pdfbox/1.1.0/jempbox-1.1.0.jar
http://apache.etoak.com/pdfbox/1.1.0/jempbox-1.1.0.jar
http://apache.freelamp.com/pdfbox/1.1.0/jempbox-1.1.0.jar
下面我们就来简单看一下这些jar包的对文档的读取的应用实例:
pom坐标为:
..... <properties> <poi.version>3.13</poi.version> <pdf.version>1.8.10</pdf.version> </properties> .... <!-- 3个jar版本必须统一 --> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>${poi.version}</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>${poi.version}</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-scratchpad</artifactId> <version>${poi.version}</version> </dependency> <!-- pdf --> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>${pdf.version}</version> </dependency> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>fontbox</artifactId> <version>${pdf.version}</version> </dependency> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>jempbox</artifactId> <version>${pdf.version}</version> </dependency> <dependency> <groupId>net.sourceforge.jexcelapi</groupId> <artifactId>jxl</artifactId> <version>2.6.12</version> </dependency>
读取内容代码
public static String getPDFContent(File file) { String context = null; InputStream f = null; try { f = new FileInputStream(file); PDFParser p = new PDFParser(f); p.parse(); PDDocument pdd = p.getPDDocument(); PDFTextStripper ts = new PDFTextStripper(); context = ts.getText(pdd); } catch (Exception e) { e.printStackTrace(); } finally { try { f.close(); } catch (IOException e) { e.printStackTrace(); } } return context == null ? "" : context; } public static String getXLSContent(File file) { StringBuilder sb = new StringBuilder(); InputStream f = null; try { f = new FileInputStream(file); jxl.Workbook rwb = Workbook.getWorkbook(f); Sheet[] sheet = rwb.getSheets(); for (int i = 0; i < sheet.length; i++) { Sheet rs = rwb.getSheet(i); for (int j = 0; j < rs.getRows(); j++) { Cell[] cells = rs.getRow(j); for (int k = 0; k < cells.length; k++) sb.append(cells[k].getContents()); } } } catch (Exception e) { e.printStackTrace(); } finally { try { f.close(); } catch (IOException e) { e.printStackTrace(); } } return sb.toString() == null ? null : sb.toString(); } public static String getWord2003(File file) { String word2003 = null; InputStream f = null; try { f = new FileInputStream(file); WordExtractor ex = new WordExtractor(f); word2003 = ex.getText(); } catch (IOException e) { e.printStackTrace(); } finally { try { f.close(); } catch (IOException e) { e.printStackTrace(); } } return word2003; } public static String getWord2007(File file) { String word2007 = null; try { OPCPackage opcPackage = POIXMLDocument .openPackage(file.getParent()); POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage); word2007 = extractor.getText(); } catch (Exception e) { e.printStackTrace(); } return word2007; }
另外 我把获取文件的MIME方法也写上吧
public static String getFileMime(File file) { if (!file.exists()) { System.err.println("文件不存在"); return null; } MagicMatch match = null; try { match = Magic.getMagicMatch(file, false); } catch (Exception e) { e.printStackTrace(); } return match.getMimeType(); }
版权所有,转载请标注
相关文章推荐
- Spring容器事件
- spring aop execution表达式
- JAVA反射机制
- Eclipse debug 的 drop to frame 的技巧
- Eclipse - 【日常问题】
- 深入分析JavaWeb Item20 -- EL表达式和EL函数库
- eclipse中在不停止服务器的情况下停止当前调试
- Java反射机制
- java动态绑定
- spring组件扫描详解
- Java中数据类型及其之间的转换(转)
- Java参数传递机制
- java enum(枚举)使用详解+总结
- java Timer
- 深入分析JavaWeb Item19 -- 基于Servlet+JSP+JavaBean开发模式的用户登录注册
- 上struts2的xml在<result type="redirect">参数问题
- java获取客户端的ip地址
- Struts2拦截器执行顺序
- JDK1.5,Switch中使用Enum
- 【JVM系列】Java类型装载、连接与初始化