您的位置:首页 > 编程语言 > Java开发

java操作office和pdf文件:读取word,excel和pdf文档内容

2015-12-15 13:44 1081 查看
引用POI包读取word文档内容

poi.jar 下载地址

http://apache.freelamp.com/poi/release/bin/poi-bin-3.6-20091214.zip

http://apache.etoak.com/poi/release/bin/poi-bin-3.6-20091214.zip

http://labs.renren.com/apache-mirror/poi/release/bin/poi-bin-3.6-20091214.zip

2.引用jxl包读取excel文档的内容

Jxl.jar下载地址

http://nchc.dl.sourceforge.net/project/jexcelapi/CSharpJExcel/CSharpJExcel.zip

3.引用PDFBox读取pdf文档的内容

Pdfbox.jar下载地址

http://labs.renren.com/apache-mirror/pdfbox/1.1.0/pdfbox-1.1.0.jar

http://apache.etoak.com/pdfbox/1.1.0/pdfbox-1.1.0.jar

http://apache.freelamp.com/pdfbox/1.1.0/pdfbox-1.1.0.jar

Fontbox.jar下载地址

http://apache.etoak.com/pdfbox/1.1.0/fontbox-1.1.0.jar

http://labs.renren.com/apache-mirror/pdfbox/1.1.0/fontbox-1.1.0.jar

http://apache.freelamp.com/pdfbox/1.1.0/fontbox-1.1.0.jar

Jempbox.jar下载地址

http://labs.renren.com/apache-mirror/pdfbox/1.1.0/jempbox-1.1.0.jar

http://apache.etoak.com/pdfbox/1.1.0/jempbox-1.1.0.jar

http://apache.freelamp.com/pdfbox/1.1.0/jempbox-1.1.0.jar

下面我们就来简单看一下这些jar包的对文档的读取的应用实例:

pom坐标为:

.....
<properties>
<poi.version>3.13</poi.version>
<pdf.version>1.8.10</pdf.version>
</properties>
....
<!-- 3个jar版本必须统一 -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>${poi.version}</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>${poi.version}</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>${poi.version}</version>
</dependency>
<!-- pdf -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>${pdf.version}</version>
</dependency>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>fontbox</artifactId>
<version>${pdf.version}</version>
</dependency>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>jempbox</artifactId>
<version>${pdf.version}</version>
</dependency>
<dependency>
<groupId>net.sourceforge.jexcelapi</groupId>
<artifactId>jxl</artifactId>
<version>2.6.12</version>
</dependency>


读取内容代码

public static String getPDFContent(File file) {
String context = null;
InputStream f = null;
try {
f = new FileInputStream(file);
PDFParser p = new PDFParser(f);
p.parse();
PDDocument pdd = p.getPDDocument();
PDFTextStripper ts = new PDFTextStripper();
context = ts.getText(pdd);
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
f.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return context == null ? "" : context;
}

public static String getXLSContent(File file) {

StringBuilder sb = new StringBuilder();
InputStream f = null;
try {
f = new FileInputStream(file);
jxl.Workbook rwb = Workbook.getWorkbook(f);
Sheet[] sheet = rwb.getSheets();
for (int i = 0; i < sheet.length; i++) {
Sheet rs = rwb.getSheet(i);
for (int j = 0; j < rs.getRows(); j++) {
Cell[] cells = rs.getRow(j);
for (int k = 0; k < cells.length; k++)
sb.append(cells[k].getContents());
}
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
f.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return sb.toString() == null ? null : sb.toString();
}

public static String getWord2003(File file) {
String word2003 = null;
InputStream f = null;
try {
f = new FileInputStream(file);
WordExtractor ex = new WordExtractor(f);
word2003 = ex.getText();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
f.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return word2003;
}

public static String getWord2007(File file) {
String word2007 = null;
try {
OPCPackage opcPackage = POIXMLDocument
.openPackage(file.getParent());
POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage);
word2007 = extractor.getText();
} catch (Exception e) {
e.printStackTrace();
}
return word2007;
}


另外 我把获取文件的MIME方法也写上吧

public static String getFileMime(File file) {
if (!file.exists()) {
System.err.println("文件不存在");
return null;
}
MagicMatch match = null;
try {
match = Magic.getMagicMatch(file, false);
} catch (Exception e) {
e.printStackTrace();
}
return match.getMimeType();
}


版权所有,转载请标注
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: