POI执行解析word转化HTML

目前来说解析word文档显示在html上有三种办法

分别是:POI(比较麻烦)

    插件(要付费,或者每天只允许调用500次,不适合大企业)

   把word转化成为PDF然后通过flash体现在页面上(不怎么样,麻烦+可操作性不强)

     使用H5执行,不太熟悉H5

 

既然选择了POI那么就开始做了。

第一步先maven导入jar包.

<dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>poi</artifactId> 
     <version>3.14</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>poi-scratchpad</artifactId> 
     <version>3.14</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>poi-ooxml</artifactId> 
     <version>3.14</version> 
    </dependency> 
    <dependency> 
     <groupId>fr.opensagres.xdocreport</groupId> 
     <artifactId>xdocreport</artifactId> 
     <version>1.0.6</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>poi-ooxml-schemas</artifactId> 
     <version>3.14</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>ooxml-schemas</artifactId> 
     <version>1.3</version> 
    </dependency> 

 

POI在解析的时候会有版本问题导致无法调用某些对象。所以word2003跟word2007需要使用不同的方法进行转化

先解析2007

 @Test
    public void word2007ToHtml() throws Exception {
        String filepath = "e:/files/";
        String sourceFileName =filepath+"前言.docx"; 
        String targetFileName = filepath+"1496717486420.html"; 
        String imagePathStr = filepath+"/image/";  
        OutputStreamWriter outputStreamWriter = null; 
        try { 
          XWPFDocument document = new XWPFDocument(new FileInputStream(sourceFileName)); 
          XHTMLOptions options = XHTMLOptions.create(); 
          // 存放图片的文件夹 
          options.setExtractor(new FileImageExtractor(new File(imagePathStr))); 
          // html中图片的路径 
          options.URIResolver(new BasicURIResolver("image")); 
          outputStreamWriter = new OutputStreamWriter(new FileOutputStream(targetFileName), "utf-8"); 
          XHTMLConverter xhtmlConverter = (XHTMLConverter) XHTMLConverter.getInstance(); 
          xhtmlConverter.convert(document, outputStreamWriter, options); 
        } finally { 
          if (outputStreamWriter != null) { 
            outputStreamWriter.close(); 
          } 
        }
      } 

然后没试过的2003

    @Test
    public void test(){
        DocxToHtml("E://files//1496635038432.doc","E://files//1496635038432.html");
    }
    public static void DocxToHtml(String fileAllName,String outPutFile){
        HWPFDocument wordDocument;
        try {
            //根据输入文件路径与名称读取文件流
            InputStream in=new FileInputStream(fileAllName);
            //把文件流转化为输入wordDom对象
            wordDocument = new HWPFDocument(in);
            //通过反射构建dom创建者工厂
            DocumentBuilderFactory domBuilderFactory=DocumentBuilderFactory.newInstance();
            //生成dom创建者
            DocumentBuilder domBuilder=domBuilderFactory.newDocumentBuilder();
            //生成dom对象
            Document dom=domBuilder.newDocument();
            //生成针对Dom对象的转化器
            WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(dom);    
            //转化器重写内部方法
             wordToHtmlConverter.setPicturesManager( new PicturesManager()    
             {    
                 public String savePicture( byte[] content,    
                         PictureType pictureType, String suggestedName,    
                         float widthInches, float heightInches )    
                 {    
                     return suggestedName;    
                 }    
             } ); 
            //转化器开始转化接收到的dom对象
            wordToHtmlConverter.processDocument(wordDocument); 
            //保存文档中的图片
        /*    List<?> pics=wordDocument.getPicturesTable().getAllPictures();    
            if(pics!=null){    
                for(int i=0;i<pics.size();i++){    
                    Picture pic = (Picture)pics.get(i);   
                    try {    
                        pic.writeImageContent(new FileOutputStream("E:/test/"+ pic.suggestFullFileName()));    
                    } catch (FileNotFoundException e) {    
                        e.printStackTrace();    
                    }      
                }    
            } */
            //从加载了输入文件中的转换器中提取DOM节点
            Document htmlDocument = wordToHtmlConverter.getDocument();  
            //从提取的DOM节点中获得内容
            DOMSource domSource = new DOMSource(htmlDocument);
            
            //字节码输出流
            ByteArrayOutputStream out = new ByteArrayOutputStream(); 
            //输出流的源头
            StreamResult streamResult = new StreamResult(out);    
            //转化工厂生成序列转化器
            TransformerFactory tf = TransformerFactory.newInstance();    
            Transformer serializer = tf.newTransformer();
            //设置序列化内容格式
            serializer.setOutputProperty(OutputKeys.ENCODING, "GB2312");    
            serializer.setOutputProperty(OutputKeys.INDENT, "yes");    
            serializer.setOutputProperty(OutputKeys.METHOD, "html");
            
            serializer.transform(domSource, streamResult);    
            //生成文件方法
            writeFile(new String(out.toByteArray()), outPutFile);
            out.close(); 
        } catch (FileNotFoundException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        } catch (TransformerConfigurationException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                } catch (TransformerException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (ParserConfigurationException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
        }
    }
    
    
     public static void writeFile(String content, String path) {    
            FileOutputStream fos = null;    
            BufferedWriter bw = null;    
            try {    
                File file = new File(path);    
                fos = new FileOutputStream(file);    
                bw = new BufferedWriter(new OutputStreamWriter(fos,"GB2312"));    
                bw.write(content);    
            } catch (FileNotFoundException fnfe) {    
                fnfe.printStackTrace();    
            } catch (IOException ioe) {    
                ioe.printStackTrace();    
            } finally {    
                try {    
                    if (bw != null)    
                        bw.close();    
                    if (fos != null)    
                        fos.close();    
                } catch (IOException ie) {    
                }    
            }    
        }    

 

这两个方法可以将word转化成HTML,注意如果是在IE8的情况下会无法显示表格边框。

我会进一步优化这个方法

posted @ 2017-06-06 14:11  酒皇  阅读(17362)  评论(0编辑  收藏  举报