java 实现word 转PDF (采用第三方技术 IText、Poi、Jsoup)
2016-03-28 17:39
791 查看
先讲讲思路:
第一步:使用 poi 将word转换成 html,这里代码一搜一堆没什么好说的千篇一律。
(值得注意的地方是IText 根据html生成文件的是否会验证html文件是否标准,例如通过poi转换的出来的html文件的一些标签会缺少标签闭合 ,
举个例子:
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<img src="test/0.jpg" style="width:5.765972in;height:8.647917in;vertical-align:text-bottom;">
这是我直接用poi生成的html中的一部分, META、img 标签明显就没有闭合标签。如果用这种html进行转换是没有办法通过itext 的校验的。会出现以下异常
错误: “The element type "meta" must be terminated by the matching end-tag "</meta>".”
org.xhtmlrenderer.util.XRRuntimeException: Can't load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException: The element type "meta" must be terminated by the matching end-tag "</meta>". 。
从错误分析也知道是我们的html不规范拉,我们采用第三方 jar 包 Jsoup, 直接调用parse方法 我们的html就标准啦!
因为遇到这个问题让我头疼了半天,没想到就这么轻松的解决了,发个博文支援一下遇到该问题的小伙伴们 !
下面是poi转换html 的代码:
[java] view plaincopy
package com.smart.sys.core.service.io.poi;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.converter.PicturesManager;
import org.apache.poi.hwpf.converter.WordToHtmlConverter;
import org.apache.poi.hwpf.usermodel.Picture;
import org.apache.poi.hwpf.usermodel.PictureType;
import org.jsoup.Jsoup;
import org.w3c.dom.Document;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.*;
import java.util.List;
/**
* Created by Carey on 15-2-2.
*/
public class Word2Html {
public static void main(String argv[]) {
try {
convert2Html("D:\\新建 Microsoft Word 文档.doc","D:\\1.html");
} catch (Exception e) {
e.printStackTrace();
}
}
//输出html文件
public static void writeFile(String content, String path) {
FileOutputStream fos = null;
[java] view plaincopy
BufferedWriter bw = null;
org.jsoup.nodes.Document doc = Jsoup.parse(content);
content=doc.html();
try {
File file = new File(path);
fos = new FileOutputStream(file);
bw = new BufferedWriter(new OutputStreamWriter(fos,"UTF-8"));
bw.write(content);
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if (bw != null)
bw.close();
if (fos != null)
fos.close();
} catch (IOException ie) {
}
}
}
//word 转 html
public static void convert2Html(String fileName, String outPutFile)
throws TransformerException, IOException,
ParserConfigurationException {
HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(fileName));//WordToHtmlUtils.loadDoc(new FileInputStream(inputFile));
//兼容2007 以上版本
// XSSFWorkbook xssfwork=new XSSFWorkbook(new FileInputStream(fileName));
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder()
.newDocument());
wordToHtmlConverter.setPicturesManager( new PicturesManager()
{
public String savePicture( byte[] content,
PictureType pictureType, String suggestedName,
float widthInches, float heightInches )
{
return "test/"+suggestedName;
}
} );
wordToHtmlConverter.processDocument(wordDocument);
//save pictures
List pics=wordDocument.getPicturesTable().getAllPictures();
if(pics!=null){
for(int i=0;i<pics.size();i++){
Picture pic = (Picture)pics.get(i);
System.out.println();
try {
pic.writeImageContent(new FileOutputStream("D:/test/"
+ pic.suggestFullFileName()));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "HTML");
serializer.transform(domSource, streamResult);
out.close();
writeFile(new String(out.toByteArray()), outPutFile);
}
}
好了第二步生成pdf ,我直接上代码了 !
[java] view plaincopy
package com.smart.sys.core.service.io.itext;
import com.lowagie.text.pdf.BaseFont;
import org.xhtmlrenderer.pdf.ITextFontResolver;
import org.xhtmlrenderer.pdf.ITextRenderer;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
/**
* Created by Carey on 15-2-2.
*/
public class Html2Pdf {
public boolean convertHtmlToPdf(String inputFile, String outputFile)
throws Exception {
ac88
OutputStream os = new FileOutputStream(outputFile);
ITextRenderer renderer = new ITextRenderer();
String url = new File(inputFile).toURI().toURL().toString();
renderer.setDocument(url);
// 解决中文支持问题
ITextFontResolver fontResolver = renderer.getFontResolver();
fontResolver.addFont("C:/Windows/Fonts/simsunb.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
//解决图片的相对路径问题
renderer.getSharedContext().setBaseURL("file:/D:/test");
renderer.layout();
renderer.createPDF(os);
os.flush();
os.close();
return true;
}
public static void main(String [] args){
Html2Pdf html2Pdf =new Html2Pdf();
try {
html2Pdf.convertHtmlToPdf("D:\\1.html","D:\\index.pdf");
} catch (Exception e) {
e.printStackTrace();
}
}
}
pdf 的转换的详细细节也是参考了一位大神的代码,人家写的非常详细,我这里也不再赘述 、这里附上链接 :
http://www.open-open.com/lib/view/open1341881830588.html
所需jar包
iText-2.0.8.jar
core-renderer.jar
iTextAsian.jar
iTextAsianCmaps.jar
jsoup-1.8.1.jar
下载地址: http://download.csdn.net/detail/ptzrbin/8419791
第一步:使用 poi 将word转换成 html,这里代码一搜一堆没什么好说的千篇一律。
(值得注意的地方是IText 根据html生成文件的是否会验证html文件是否标准,例如通过poi转换的出来的html文件的一些标签会缺少标签闭合 ,
举个例子:
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<img src="test/0.jpg" style="width:5.765972in;height:8.647917in;vertical-align:text-bottom;">
这是我直接用poi生成的html中的一部分, META、img 标签明显就没有闭合标签。如果用这种html进行转换是没有办法通过itext 的校验的。会出现以下异常
错误: “The element type "meta" must be terminated by the matching end-tag "</meta>".”
org.xhtmlrenderer.util.XRRuntimeException: Can't load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException: The element type "meta" must be terminated by the matching end-tag "</meta>". 。
从错误分析也知道是我们的html不规范拉,我们采用第三方 jar 包 Jsoup, 直接调用parse方法 我们的html就标准啦!
因为遇到这个问题让我头疼了半天,没想到就这么轻松的解决了,发个博文支援一下遇到该问题的小伙伴们 !
下面是poi转换html 的代码:
[java] view plaincopy
package com.smart.sys.core.service.io.poi;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.converter.PicturesManager;
import org.apache.poi.hwpf.converter.WordToHtmlConverter;
import org.apache.poi.hwpf.usermodel.Picture;
import org.apache.poi.hwpf.usermodel.PictureType;
import org.jsoup.Jsoup;
import org.w3c.dom.Document;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.*;
import java.util.List;
/**
* Created by Carey on 15-2-2.
*/
public class Word2Html {
public static void main(String argv[]) {
try {
convert2Html("D:\\新建 Microsoft Word 文档.doc","D:\\1.html");
} catch (Exception e) {
e.printStackTrace();
}
}
//输出html文件
public static void writeFile(String content, String path) {
FileOutputStream fos = null;
[java] view plaincopy
BufferedWriter bw = null;
org.jsoup.nodes.Document doc = Jsoup.parse(content);
content=doc.html();
try {
File file = new File(path);
fos = new FileOutputStream(file);
bw = new BufferedWriter(new OutputStreamWriter(fos,"UTF-8"));
bw.write(content);
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if (bw != null)
bw.close();
if (fos != null)
fos.close();
} catch (IOException ie) {
}
}
}
//word 转 html
public static void convert2Html(String fileName, String outPutFile)
throws TransformerException, IOException,
ParserConfigurationException {
HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(fileName));//WordToHtmlUtils.loadDoc(new FileInputStream(inputFile));
//兼容2007 以上版本
// XSSFWorkbook xssfwork=new XSSFWorkbook(new FileInputStream(fileName));
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder()
.newDocument());
wordToHtmlConverter.setPicturesManager( new PicturesManager()
{
public String savePicture( byte[] content,
PictureType pictureType, String suggestedName,
float widthInches, float heightInches )
{
return "test/"+suggestedName;
}
} );
wordToHtmlConverter.processDocument(wordDocument);
//save pictures
List pics=wordDocument.getPicturesTable().getAllPictures();
if(pics!=null){
for(int i=0;i<pics.size();i++){
Picture pic = (Picture)pics.get(i);
System.out.println();
try {
pic.writeImageContent(new FileOutputStream("D:/test/"
+ pic.suggestFullFileName()));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "HTML");
serializer.transform(domSource, streamResult);
out.close();
writeFile(new String(out.toByteArray()), outPutFile);
}
}
好了第二步生成pdf ,我直接上代码了 !
[java] view plaincopy
package com.smart.sys.core.service.io.itext;
import com.lowagie.text.pdf.BaseFont;
import org.xhtmlrenderer.pdf.ITextFontResolver;
import org.xhtmlrenderer.pdf.ITextRenderer;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
/**
* Created by Carey on 15-2-2.
*/
public class Html2Pdf {
public boolean convertHtmlToPdf(String inputFile, String outputFile)
throws Exception {
ac88
OutputStream os = new FileOutputStream(outputFile);
ITextRenderer renderer = new ITextRenderer();
String url = new File(inputFile).toURI().toURL().toString();
renderer.setDocument(url);
// 解决中文支持问题
ITextFontResolver fontResolver = renderer.getFontResolver();
fontResolver.addFont("C:/Windows/Fonts/simsunb.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
//解决图片的相对路径问题
renderer.getSharedContext().setBaseURL("file:/D:/test");
renderer.layout();
renderer.createPDF(os);
os.flush();
os.close();
return true;
}
public static void main(String [] args){
Html2Pdf html2Pdf =new Html2Pdf();
try {
html2Pdf.convertHtmlToPdf("D:\\1.html","D:\\index.pdf");
} catch (Exception e) {
e.printStackTrace();
}
}
}
pdf 的转换的详细细节也是参考了一位大神的代码,人家写的非常详细,我这里也不再赘述 、这里附上链接 :
http://www.open-open.com/lib/view/open1341881830588.html
所需jar包
iText-2.0.8.jar
core-renderer.jar
iTextAsian.jar
iTextAsianCmaps.jar
jsoup-1.8.1.jar
下载地址: http://download.csdn.net/detail/ptzrbin/8419791
相关文章推荐
- struts2 Constants
- Dijkstra算法的java实现
- Integer与int的比较
- Java中 List移除相应元素的超简洁写法
- Java集合
- ElasticSearch java API 按照某个字段排序
- ElasticSearch java API 按照某个字段排序
- java日期格式
- Spring+JDBC的简单配置和开发
- Struts2 框架验证
- Java 数组整理
- eclipse js验证错误
- Dagger2+Retrofit+RxJava
- Spring websocket over STOMP使用指南
- 深入理解Java的接口和抽象类
- JDK源码之List
- java.lang.TypeNotPresentException: Type org.eclipse.jetty.maven.plugin.JettyRunMojo not present
- 跟着开涛学springmvc
- the project description file (.project) is out of sync with the file system.
- java第三次作业