您好,登錄后才能下訂單哦!
項目后端使用了springboot,maven,前端使用了ckeditor富文本編輯器。目前從html轉換的word為doc格式,而圖片處理支持的是docx格式,所以需要手動把doc另存為docx,然后才可以進行圖片替換。
一.添加maven依賴
主要使用了以下和poi相關的依賴,為了便于獲取html的圖片元素,還使用了jsoup:
<dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-scratchpad</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>fr.opensagres.xdocreport</groupId> <artifactId>xdocreport</artifactId> <version>1.0.6</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml-schemas</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>ooxml-schemas</artifactId> <version>1.3</version> </dependency> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.11.3</version> </dependency>
二.word轉換為html
在springboot項目的resources目錄下新建static文件夾,將需要轉換的word文件temp.docx粘貼進去,由于static是springboot的默認資源文件,所以不需要在配置文件里面另行配置了,如果改成其他名字,需要在application.yml進行相應配置。
doc格式轉換為html:
public static String docToHtml() throws Exception { File path = new File(ResourceUtils.getURL("classpath:").getPath()); String imagePathStr = path.getAbsolutePath() + "\\static\\image\\"; String sourceFileName = path.getAbsolutePath() + "\\static\\test.doc"; String targetFileName = path.getAbsolutePath() + "\\static\\test2.html"; File file = new File(imagePathStr); if(!file.exists()) { file.mkdirs(); } HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(sourceFileName)); org.w3c.dom.Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument(); WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(document); //保存圖片,并返回圖片的相對路徑 wordToHtmlConverter.setPicturesManager((content, pictureType, name, width, height) -> { try (FileOutputStream out = new FileOutputStream(imagePathStr + name)) { out.write(content); } catch (Exception e) { e.printStackTrace(); } return "image/" + name; }); wordToHtmlConverter.processDocument(wordDocument); org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument(); DOMSource domSource = new DOMSource(htmlDocument); StreamResult streamResult = new StreamResult(new File(targetFileName)); TransformerFactory tf = TransformerFactory.newInstance(); Transformer serializer = tf.newTransformer(); serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8"); serializer.setOutputProperty(OutputKeys.INDENT, "yes"); serializer.setOutputProperty(OutputKeys.METHOD, "html"); serializer.transform(domSource, streamResult); return targetFileName; }
docx格式轉換為html
public static String docxToHtml() throws Exception { File path = new File(ResourceUtils.getURL("classpath:").getPath()); String imagePath = path.getAbsolutePath() + "\\static\\image"; String sourceFileName = path.getAbsolutePath() + "\\static\\test.docx"; String targetFileName = path.getAbsolutePath() + "\\static\\test.html"; OutputStreamWriter outputStreamWriter = null; try { XWPFDocument document = new XWPFDocument(new FileInputStream(sourceFileName)); XHTMLOptions options = XHTMLOptions.create(); // 存放圖片的文件夾 options.setExtractor(new FileImageExtractor(new File(imagePath))); // html中圖片的路徑 options.URIResolver(new BasicURIResolver("image")); outputStreamWriter = new OutputStreamWriter(new FileOutputStream(targetFileName), "utf-8"); XHTMLConverter xhtmlConverter = (XHTMLConverter) XHTMLConverter.getInstance(); xhtmlConverter.convert(document, outputStreamWriter, options); } finally { if (outputStreamWriter != null) { outputStreamWriter.close(); } } return targetFileName; }
轉換成功后會生成對應的html文件,如果想在前端展示,直接讀取文件轉換為String返回給前端即可。
public static String readfile(String filePath) { File file = new File(filePath); InputStream input = null; try { input = new FileInputStream(file); } catch (FileNotFoundException e) { e.printStackTrace(); } StringBuffer buffer = new StringBuffer(); byte[] bytes = new byte[1024]; try { for (int n; (n = input.read(bytes)) != -1;) { buffer.append(new String(bytes, 0, n, "utf8")); } } catch (IOException e) { e.printStackTrace(); } return buffer.toString(); }
在富文本編輯器ckeditor中的顯示效果:
三.html轉換為word
實現思路就是先把html中的所有圖片元素提取出來,統一替換為變量字符”${imgReplace}“,如果多張圖片,可以依序排列下去,之后生成對應的doc文件(之前試過直接生成docx文件發現打不開,這個問題尚未找到好的解決方法),我們將其另存為docx文件,之后就可以替換變量為圖片了:
public static String writeWordFile(String content) { String path = "D:/wordFile"; Map<String, Object> param = new HashMap<String, Object>(); if (!"".equals(path)) { File fileDir = new File(path); if (!fileDir.exists()) { fileDir.mkdirs(); } content = HtmlUtils.htmlUnescape(content); List<HashMap<String, String>> imgs = getImgStr(content); int count = 0; for (HashMap<String, String> img : imgs) { count++; //處理替換以“/>”結尾的img標簽 content = content.replace(img.get("img"), "${imgReplace" + count + "}"); //處理替換以“>”結尾的img標簽 content = content.replace(img.get("img1"), "${imgReplace" + count + "}"); Map<String, Object> header = new HashMap<String, Object>(); try { File filePath = new File(ResourceUtils.getURL("classpath:").getPath()); String imagePath = filePath.getAbsolutePath() + "\\static\\"; imagePath += img.get("src").replaceAll("/", "\\\\"); //如果沒有寬高屬性,默認設置為400*300 if(img.get("width") == null || img.get("height") == null) { header.put("width", 400); header.put("height", 300); }else { header.put("width", (int) (Double.parseDouble(img.get("width")))); header.put("height", (int) (Double.parseDouble(img.get("height")))); } header.put("type", "jpg"); header.put("content", OfficeUtil.inputStream2ByteArray(new FileInputStream(imagePath), true)); } catch (FileNotFoundException e) { e.printStackTrace(); } param.put("${imgReplace" + count + "}", header); } try { // 生成doc格式的word文檔,需要手動改為docx byte by[] = content.getBytes("UTF-8"); ByteArrayInputStream bais = new ByteArrayInputStream(by); POIFSFileSystem poifs = new POIFSFileSystem(); DirectoryEntry directory = poifs.getRoot(); DocumentEntry documentEntry = directory.createDocument("WordDocument", bais); FileOutputStream ostream = new FileOutputStream("D:\\wordFile\\temp.doc"); poifs.writeFilesystem(ostream); bais.close(); ostream.close(); // 臨時文件(手動改好的docx文件) CustomXWPFDocument doc = OfficeUtil.generateWord(param, "D:\\wordFile\\temp.docx"); //最終生成的帶圖片的word文件 FileOutputStream fopts = new FileOutputStream("D:\\wordFile\\final.docx"); doc.write(fopts); fopts.close(); } catch (Exception e) { e.printStackTrace(); } } return "D:/wordFile/final.docx"; } //獲取html中的圖片元素信息 public static List<HashMap<String, String>> getImgStr(String htmlStr) { List<HashMap<String, String>> pics = new ArrayList<HashMap<String, String>>(); Document doc = Jsoup.parse(htmlStr); Elements imgs = doc.select("img"); for (Element img : imgs) { HashMap<String, String> map = new HashMap<String, String>(); if(!"".equals(img.attr("width"))) { map.put("width", img.attr("width").substring(0, img.attr("width").length() - 2)); } if(!"".equals(img.attr("height"))) { map.put("height", img.attr("height").substring(0, img.attr("height").length() - 2)); } map.put("img", img.toString().substring(0, img.toString().length() - 1) + "/>"); map.put("img1", img.toString()); map.put("src", img.attr("src")); pics.add(map); } return pics; }
OfficeUtil工具類,之前發現網上的寫法只支持一張圖片的修改,多張圖片就會報錯,是因為添加了圖片,processParagraphs方法中的runs的大小改變了,會報ArrayList的異常,就和我們循環list中刪除元素會報異常道理一樣,解決方法就是復制一個新的Arraylist進行循環即可:
package com.example.demo.util; import java.io.ByteArrayInputStream; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import java.util.Map; import java.util.Map.Entry; import org.apache.poi.POIXMLDocument; import org.apache.poi.hwpf.extractor.WordExtractor; import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.xwpf.usermodel.XWPFParagraph; import org.apache.poi.xwpf.usermodel.XWPFRun; import org.apache.poi.xwpf.usermodel.XWPFTable; import org.apache.poi.xwpf.usermodel.XWPFTableCell; import org.apache.poi.xwpf.usermodel.XWPFTableRow; /** * 適用于word 2007 */ public class OfficeUtil { /** * 根據指定的參數值、模板,生成 word 文檔 * @param param 需要替換的變量 * @param template 模板 */ public static CustomXWPFDocument generateWord(Map<String, Object> param, String template) { CustomXWPFDocument doc = null; try { OPCPackage pack = POIXMLDocument.openPackage(template); doc = new CustomXWPFDocument(pack); if (param != null && param.size() > 0) { //處理段落 List<XWPFParagraph> paragraphList = doc.getParagraphs(); processParagraphs(paragraphList, param, doc); //處理表格 Iterator<XWPFTable> it = doc.getTablesIterator(); while (it.hasNext()) { XWPFTable table = it.next(); List<XWPFTableRow> rows = table.getRows(); for (XWPFTableRow row : rows) { List<XWPFTableCell> cells = row.getTableCells(); for (XWPFTableCell cell : cells) { List<XWPFParagraph> paragraphListTable = cell.getParagraphs(); processParagraphs(paragraphListTable, param, doc); } } } } } catch (Exception e) { e.printStackTrace(); } return doc; } /** * 處理段落 * @param paragraphList */ public static void processParagraphs(List<XWPFParagraph> paragraphList,Map<String, Object> param,CustomXWPFDocument doc){ if(paragraphList != null && paragraphList.size() > 0){ for(XWPFParagraph paragraph:paragraphList){ //poi轉換過來的行間距過大,需要手動調整 if(paragraph.getSpacingBefore() >= 1000 || paragraph.getSpacingAfter() > 1000) { paragraph.setSpacingBefore(0); paragraph.setSpacingAfter(0); } //設置word中左右間距 paragraph.setIndentationLeft(0); paragraph.setIndentationRight(0); List<XWPFRun> runs = paragraph.getRuns(); //加了圖片,修改了paragraph的runs的size,所以循環不能使用runs List<XWPFRun> allRuns = new ArrayList<XWPFRun>(runs); for (XWPFRun run : allRuns) { String text = run.getText(0); if(text != null){ boolean isSetText = false; for (Entry<String, Object> entry : param.entrySet()) { String key = entry.getKey(); if(text.indexOf(key) != -1){ isSetText = true; Object value = entry.getValue(); if (value instanceof String) {//文本替換 text = text.replace(key, value.toString()); } else if (value instanceof Map) {//圖片替換 text = text.replace(key, ""); Map pic = (Map)value; int width = Integer.parseInt(pic.get("width").toString()); int height = Integer.parseInt(pic.get("height").toString()); int picType = getPictureType(pic.get("type").toString()); byte[] byteArray = (byte[]) pic.get("content"); ByteArrayInputStream byteInputStream = new ByteArrayInputStream(byteArray); try { String blipId = doc.addPictureData(byteInputStream,picType); doc.createPicture(blipId,doc.getNextPicNameNumber(picType), width, height,paragraph); } catch (Exception e) { e.printStackTrace(); } } } } if(isSetText){ run.setText(text,0); } } } } } } /** * 根據圖片類型,取得對應的圖片類型代碼 * @param picType * @return int */ private static int getPictureType(String picType){ int res = CustomXWPFDocument.PICTURE_TYPE_PICT; if(picType != null){ if(picType.equalsIgnoreCase("png")){ res = CustomXWPFDocument.PICTURE_TYPE_PNG; }else if(picType.equalsIgnoreCase("dib")){ res = CustomXWPFDocument.PICTURE_TYPE_DIB; }else if(picType.equalsIgnoreCase("emf")){ res = CustomXWPFDocument.PICTURE_TYPE_EMF; }else if(picType.equalsIgnoreCase("jpg") || picType.equalsIgnoreCase("jpeg")){ res = CustomXWPFDocument.PICTURE_TYPE_JPEG; }else if(picType.equalsIgnoreCase("wmf")){ res = CustomXWPFDocument.PICTURE_TYPE_WMF; } } return res; } /** * 將輸入流中的數據寫入字節數組 * @param in * @return */ public static byte[] inputStream2ByteArray(InputStream in,boolean isClose){ byte[] byteArray = null; try { int total = in.available(); byteArray = new byte[total]; in.read(byteArray); } catch (IOException e) { e.printStackTrace(); }finally{ if(isClose){ try { in.close(); } catch (Exception e2) { System.out.println("關閉流失敗"); } } } return byteArray; } }
我認為之所以word2003不支持圖片替換,主要是處理2003版本的HWPFDocument對象被聲明為了final,我們就無法重寫他的方法了。而處理2007版本的類為XWPFDocument,是可以繼承的,通過繼承XWPFDocument,重寫createPicture方法即可實現圖片替換,以下為對應的CustomXWPFDocument類:
package com.example.demo.util; import java.io.IOException; import java.io.InputStream; import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.xwpf.usermodel.XWPFDocument; import org.apache.poi.xwpf.usermodel.XWPFParagraph; import org.apache.xmlbeans.XmlException; import org.apache.xmlbeans.XmlToken; import org.openxmlformats.schemas.drawingml.x2006.main.CTNonVisualDrawingProps; import org.openxmlformats.schemas.drawingml.x2006.main.CTPositiveSize2D; import org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTInline; /** * 自定義 XWPFDocument,并重寫 createPicture()方法 */ public class CustomXWPFDocument extends XWPFDocument { public CustomXWPFDocument(InputStream in) throws IOException { super(in); } public CustomXWPFDocument() { super(); } public CustomXWPFDocument(OPCPackage pkg) throws IOException { super(pkg); } /** * @param ind * @param width 寬 * @param height 高 * @param paragraph 段落 */ public void createPicture(String blipId, int ind, int width, int height,XWPFParagraph paragraph) { final int EMU = 9525; width *= EMU; height *= EMU; CTInline inline = paragraph.createRun().getCTR().addNewDrawing().addNewInline(); String picXml = "" + "<a:graphic xmlns:a=\"http://schemas.openxmlformats.org/drawingml/2006/main\">" + " <a:graphicData uri=\"http://schemas.openxmlformats.org/drawingml/2006/picture\">" + " <pic:pic xmlns:pic=\"http://schemas.openxmlformats.org/drawingml/2006/picture\">" + " <pic:nvPicPr>" + " <pic:cNvPr id=\"" + ind + "\" name=\"Generated\"/>" + " <pic:cNvPicPr/>" + " </pic:nvPicPr>" + " <pic:blipFill>" + " <a:blip r:embed=\"" + blipId + "\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\"/>" + " <a:stretch>" + " <a:fillRect/>" + " </a:stretch>" + " </pic:blipFill>" + " <pic:spPr>" + " <a:xfrm>" + " <a:off x=\"0\" y=\"0\"/>" + " <a:ext cx=\"" + width + "\" cy=\"" + height + "\"/>" + " </a:xfrm>" + " <a:prstGeom prst=\"rect\">" + " <a:avLst/>" + " </a:prstGeom>" + " </pic:spPr>" + " </pic:pic>" + " </a:graphicData>" + "</a:graphic>"; inline.addNewGraphic().addNewGraphicData(); XmlToken xmlToken = null; try { xmlToken = XmlToken.Factory.parse(picXml); } catch (XmlException xe) { xe.printStackTrace(); } inline.set(xmlToken); inline.setDistT(0); inline.setDistB(0); inline.setDistL(0); inline.setDistR(0); CTPositiveSize2D extent = inline.addNewExtent(); extent.setCx(width); extent.setCy(height); CTNonVisualDrawingProps docPr = inline.addNewDocPr(); docPr.setId(ind); docPr.setName("圖片" + ind); docPr.setDescr("測試"); } }
以上就是通過POI實現html和word的相互轉換,對于html無法轉換為可讀的docx這個問題尚未解決,如果大家有好的解決方法可以交流一下。
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。