使用 Open XML 操作文档模板自动生成报表

Open XML SDK 是微软提供的一个用于编辑于操作 MS Office 文档的类库，通过该类库我们可以用编程方式创建，编辑Office 文档，当然这对 Office 版本是有要求的，只支持Office 2007+。

Open XML SDK 下载：点此链接

微软文档：http://msdn.microsoft.com/zh-cn/library/bb448854.aspx

本文源代码下载：点此下载

自 Offce 2007开始，微软使用了新的架构来实现 Office 套件，那就是基于 xml。如果我们给一个word 2007 或 word 2010文档添加.zip后缀，并用解压缩文件打开，可以看到该文档包含了一堆 xml 文件。如下图所示：

上图就是一个 Word 的构成，其中 word 目录是其关键内容部分，word/media 包含该文档用到的多媒体资源文件，如图片，声音等，word/theme 包含对该文档的主题定义，如字体神马的，有点象网站的 css 文件，word/document.xml包含了具体的内容，如文字内容，布局，图片引用等，是我们研究的重点文档。下面显示只包含一行“罗朝辉的blog”的文档的word/document.xml内容：

  <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
- <w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
- <w:body>
- <w:p w:rsidR="00111330" w:rsidRDefault="000D4700">
- <w:r>
- <w:rPr>
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" /> 
  <w:color w:val="000000" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>罗朝辉的</w:t> 
  </w:r>
  <w:proofErr w:type="spellStart" /> 
- <w:r>
- <w:rPr>
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" /> 
  <w:color w:val="000000" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>blog</w:t> 
  </w:r>
- <w:r w:rsidR="00984A94">
- <w:rPr>
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" w:hint="eastAsia" /> 
  <w:color w:val="000000" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>:</w:t> 
  </w:r>
- <w:hyperlink r:id="rId5" w:history="1">
- <w:r w:rsidR="00984A94" w:rsidRPr="00984A94">
- <w:rPr>
  <w:rStyle w:val="a3" /> 
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" w:hint="eastAsia" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>http</w:t> 
  </w:r>
  <w:proofErr w:type="spellEnd" /> 
- <w:r w:rsidR="00984A94" w:rsidRPr="00984A94">
- <w:rPr>
  <w:rStyle w:val="a3" /> 
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" w:hint="eastAsia" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>://kesalin.cnblogs.com</w:t> 
  </w:r>
  </w:hyperlink>
  <w:bookmarkStart w:id="0" w:name="_GoBack" /> 
  <w:bookmarkEnd w:id="0" /> 
  </w:p>
- <w:sectPr w:rsidR="00111330">
  <w:pgSz w:w="11906" w:h="16838" /> 
  <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="851" w:footer="992" w:gutter="0" /> 
  <w:cols w:space="425" /> 
  <w:docGrid w:type="lines" w:linePitch="312" /> 
  </w:sectPr>
  </w:body>
  </w:document>

上面的 xml 看起来很凌乱，如果我们通过 Open XML SDK 工具来查看的话就一目了然了：

从上面我们就可以清晰看出 word 文档的结构。一个 word文档包含一个主 document 元素，该 document 又包含 body 元素，body包含paragraph 元素或 table 元素；而 paragraph 元素包含 run 元素，一个 run 元素包含 text 元素；一个 table 元素包含 tableRow元素，tableRow包含 tableCell元素，tableCell 是个容器可以包含 paragraph 或其他运行时元素 run等。具体层次结构请参考：控制 Open XML WordprocessingML 文档中文本

有了这些前奏知识，下面步入正题：如何创建文档模板，通过编程方式修改模板内容，在这里只讲怎样修改文本和图片。

一，首先，创建文档模板。

打开 word 2010 or 2007，在文件->选型->自定义功能区，选择开发工具，让开发工具在word上面的工具栏上显示。

然后向文档中中添加文本和图片内容控件，如下图所示：

添加方法：选择一个内容控件，然后为内容控件添加默认的内容（文字或图片），选中内容控件，点击开发工具->属性，为该内容控件添加标题或标记（tagID），这一步很重要，这个tagID是唯一标识该内容控件的，在代码中我们就是通过该tagID来定位具体内容控件的。

最终结果：（请参考下载文件中的 Template.docx 文件。）

在上面的图中可以看出我们添加富文本，纯文本以及图片内容控件。下面我们使用代码在代码中将这些 placeholder 控件的内容替换。这是自动生成报表文档的关键技术所在。

如果我们打开 document.xml 文件，查看文本内容控件部分，就可以清晰地看出内容控件的布局：

在上图可以看到这个文本内容控件包含在一个 sdt （Structured Document Tag）元素中，在前面的介绍中，我们知道文本内容最终会被包含在一个 Run->Text元素中，因替换操作只需要按照内容控件的 tagID 找到该 sdt 元素，将其 Text 元素内容替换即可。图像替换操作也是同样的处理，只是有一些额外的事情需要注意。内容控件都是包含在某个 sdt 元素中的，sdt 元素可能是 SdtBlock, SdtCell, SdtRun等，它们都是 SdtElement的子类。

二，使用 Open XML 打开和关闭 Word 文档。

1，Open XML 中用于操作 Word 的类为 WordprocessingDocument，通过该类提供的接口，我们可以方便地打开和关闭 word 文档。WordprocessingDocument.Open带有两个参数：一个是文档路径，一个用于指示是否是可编辑方式打开。

        /// <summary>
        /// Contains the word processing document
        /// </summary>
        private WordprocessingDocument _wordProcessingDocument;

        /// <summary>
        /// Contains the main document part
        /// </summary>
        private MainDocumentPart _mainDocPart;

        /// <summary>
        /// Open an Word XML document 
        /// </summary>
        /// <param name="docname">name of the document to be opened</param>
        public void OpenDocuemnt(string docname)
        {
            // open the word docx
            _wordProcessingDocument = WordprocessingDocument.Open(docname, true);

            // get the Main Document part
            _mainDocPart = _wordProcessingDocument.MainDocumentPart;
        }

        /// <summary>
        /// Close the document
        /// </summary>
        public void CloseDocument()
        {
            _wordProcessingDocument.Close();
        }

打开文档之后，我们获取主 document 部分（即word/document.xml那部分）。

2，下面我们来替换文档中的文本内容控件。让我们来试验下TDD流程，首先我们知道具体的内容控件的 tagID和想要替换的文字，这两个就是我们的输入：

var textDict = new Dictionary<string, string>
                               {
                                   {"TextPlaceholder_01", "SdtBlock替换文本"},
                                   {"PH_Name", "张三"},
                                   {"PH_Age", "18"},
                                   {"PH_Class", "C82"},
                                   {"PH_Grade", "83.0"},
                                   {"PH_SdtRun", "SdtRun替换"},
                               };

然后我们想要调用一个方法，将模板文档中所匹配 tagID 的文本内容控件的文字替换掉：

        /// <summary>
        /// Updated text placeholders with texts.
        /// </summary>
        /// <param name="tagValueDict">Pair of placeholder tagID and text to replace.</param>
        public void UpdateText(Dictionary<string, string> tagValueDict)
        {
            foreach (var pair in tagValueDict)
            {
                var tagID = pair.Key;
                var value = pair.Value;

                foreach (var sdtElement in _mainDocPart.Document.Body.Descendants<SdtElement>())
                {
                    if (sdtElement.SdtProperties.GetFirstChild<Tag>().Val == tagID)
                    {
                        OpenXmlElement parantElement = sdtElement.Descendants<Paragraph>().SingleOrDefault();
                        if (null == parantElement)
                        {
                            SdtContentRun cr = sdtElement.Descendants<SdtContentRun>().SingleOrDefault();
                            parantElement = cr;
                        }

                        if (null != parantElement)
                        {
                            Run r = parantElement.Descendants<Run>().SingleOrDefault();
                            if (null != r)
                            {
                                Text t = r.Descendants<Text>().SingleOrDefault();
                                if (null != t)
                                {
                                    r.AppendChild(new Text(value));
                                    r.RemoveChild(t);
                                }
                            }

                            break;
                        }
                    }
                }
            }
        }

上面的代码遍历 body 元素中所以的 sdt 元素，如果某个 sdt 的tagID与要查找的 tagID相等，则说明找到了相应的内容控件，然后找到该 sdt 元素下的 Run 元素，将其子元素 Text 用赋予了新内容的 Text 替换掉即可。

3，下面来看看如何实现图片的替换，还是用TDD流程，首先我们有图片内容控件的tagID 以及图片资源。

var imageDict = new Dictionary<string, MemoryStream>
                                {
                                    {"PH_ImageInSdtBlock_01", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtCell_01", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtCell_02", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtRun", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtBlock_02", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                };

然后我们想要调用一个方法，将模板文档中所匹配 tagID 的图片内容控件的图片替换掉，先前我们介绍到图片资源是放在 media目录下的，Open XML 会对图片资源进行管理，分配给资源一个 rid，然后在其他地方使用该 rid 来引用具体的资源。所以我们需要找到图片内容控件，然后在该控件下找到引用的图片资源id，根据跟资源id获取内容控件的相关信息，如图片大小等，然后将改资源id 对应的资源替换掉。下面来看代码：

internal static string GetImageRelID<TSdtType>(TSdtType sdt, string imageTag) where TSdtType : SdtElement
        {
            // loop through all tags in the document within the sdt element
            foreach (Tag t in sdt.Descendants<Tag>().ToList())
            {
                // Do we have the correct tag?
                if (t.Val.ToString().ToUpper() == imageTag.ToUpper())
                {
                    // Get the BLIP for the image - there is only one image per placeholder so no need to loop through anything
                    Blip b = sdt.Descendants<Blip>().FirstOrDefault();
                    if (null != b)
                    {
                        // return the image id tag
                        return b.Embed.Value;
                    }
                }
            }

            return string.Empty;
        }

上面的代码用于在某个 sdt 元素下面查找匹配内容控件ID所使用的图像资源id。然后我们根据该资源id来获取placeholder image的大小：

internal static void GetPlaceholderImageSize(IEnumerable<Drawing> drawingList, string relID, out int width, out int height)
        {
            width = -1;
            height = -1;

            // Loop through all Drawing elements in the document
            foreach (Drawing d in drawingList)
            {
                // Loop through all the pictures (Blip) in the document
                if (d.Descendants<Blip>().ToList().Any(b => b.Embed.ToString() == relID))
                {
                    // The document size is in EMU. 1 pixel = 9525 EMU

                    // The size of the image placeholder is located in the EXTENT element
                    Extent e = d.Descendants<Extent>().FirstOrDefault();
                    if (null != e)
                    {
                        width = (int)(e.Cx / 9525);
                        height = (int)(e.Cy / 9525);
                    }

                    if (width == -1)
                    {
                        // The size of the image is located in the EXTENTS element
                        Extents e2 = d.Descendants<Extents>().FirstOrDefault();
                        if (null != e2)
                        {
                            width = (int)(e2.Cx / 9525);
                            height = (int)(e2.Cy / 9525);
                        }
                    }
                }
            }
        }

获取到大小信息之后，我们就可以使用资源id以及图像大小信息，替换图像来替换具体的placeholder图像了。

        private void UpdateImagePart(string relID, MemoryStream imageStream, int width, int height)
        {
            var originalBitmap = Image.FromStream(imageStream);
            var bitmap = originalBitmap;
　　　　　　　// resize image
            if (width != -1)
            {
                bitmap = new Bitmap(originalBitmap, width, height);
            }

            // Save image data to ImagePart
            var stream = new MemoryStream();
            bitmap.Save(stream, originalBitmap.RawFormat);

            // Get the ImagePart
            var imagePart = (ImagePart)_mainDocPart.GetPartById(relID);

            // Create a writer to the ImagePart
            var writer = new BinaryWriter(imagePart.GetStream());

            // Overwrite the current image in the docx file package
            writer.Write(stream.ToArray());

            // Close the ImagePart
            writer.Close();
        }

最终，我们就得到了更新图片的接口：

        public void UpdateImage(Dictionary<string, MemoryStream> tagValueDict)
        {
            foreach (var pair in tagValueDict)
            {
                var tagID = pair.Key;
                var imageStream = pair.Value;

                foreach (SdtElement sdtElement in _mainDocPart.Document.Body.Descendants<SdtElement>())
                {
                    string relID = GetImageRelID(sdtElement, tagID);
                    if (!string.IsNullOrEmpty(relID))
                    {
                        // Get size of image
                        int imageWidth;
                        int imageHeight;
                        GetPlaceholderImageSize(_mainDocPart.Document.Body.Descendants<Drawing>(), relID, out imageWidth, out imageHeight);

                        UpdateImagePart(relID, imageStream, imageWidth, imageHeight);

                        break;
                    }
                }
            }
        }

三，测试

写一个控制台测试程序，将拷贝模板文档至输出文档，将输出文档中的内容和图片替换：

        static void Main()
        {
            const string templateDocx = @"..\..\Template.docx";
            const string outputDocx = @"..\..\Output.docx";

            // copy the word doc so you can see the difference between the two
            File.Delete(outputDocx);
            File.Copy(templateDocx, outputDocx);

            var contentControlManager = new ContentControlManager();
            contentControlManager.OpenDocuemnt(outputDocx);

            var textDict = new Dictionary<string, string>
                               {
                                   {"TextPlaceholder_01", "SdtBlock替换文本"},
                                   {"PH_Name", "张三"},
                                   {"PH_Age", "18"},
                                   {"PH_Class", "C82"},
                                   {"PH_Grade", "83.0"},
                                   {"PH_SdtRun", "SdtRun替换"},
                               };

            contentControlManager.UpdateText(textDict);

            var imageDict = new Dictionary<string, MemoryStream>
                                {
                                    {"PH_ImageInSdtBlock_01", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtCell_01", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtCell_02", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtRun", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtBlock_02", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                };

            contentControlManager.UpdateImage(imageDict);

            contentControlManager.CloseDocument();
        }

打开生成 Output.docx，可以看到内容已经替换掉了：

源码下载：点此下载

posted @ 2012-04-18 18:32 飘飘白云阅读(10438) 评论(7) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

使用 Open XML 操作文档模板自动生成报表

公告