代码笔记:使用Xml2Linq和CodeDom自动重整多语化资源文件

本贴没啥干货,纯碎只是记录下写过的代码,路过的大侠们可以绕道走~

背景:项目的多语化的Message使用的是:

  1. 用XML Messages.xml 来存放languages resource
  2. 使用C# MessageKeys 类存放所有Message key及其在XML资源文件中的对应Message ID
  3. 在代码中调用这个MessageKey

现在,项目有个Refactor工作,他们设计了更高一级别的Common Message,这样就不希望各个子项目中的message太繁杂难管理,而我“不幸”被分派了这个比较坑妈的活,就是将我当时所在项目的近2000条Message的一条一条的整理查看,并将可以转换成Common Message的转换掉,而且Message ID和Message Code都要重新整理,且里面的描述性文字都得由BA重新审核了改掉。这样就是个很痛苦的手工活。我想想就受不了,这眼得瞎,只得想了个办法。

  1. 将Messages.xml的文件及对应的MessageKey name导出成excel,这样大家可以看着分析excel
  2. 经过大家的讨论消减后,得到最终化的excel
  3. 将excel重新导成Messages.xml,以及C# MessageKeys类(因为之前有将MessageKey也导出到EXCEL中,是为了能导出)
  4. 各程序员们开始进行代码重构,然后BA可以再慢慢对DESCRIPTION修修补补,我就可以再使用这个自动程序生成XML
  5. 将来有任何变化,可以直接改了XML,再自动生成CLASS,很方便

于是,这里面就要用到几个小技术

  1. 为XML构建一些DTO (Data Transfer Object)
  2. 导出XML为excel,这个要用到System.Xml.Linq
  3. 导出excel为XML,这个要使用到System.Xml.Serialization,以及使用Excel作为数据源(直接用的connection string,没有用interrop)
  4. 导出excel为C#代码文件,使用System.CodeDom

多的不说了,上代码了,第一次使用CodeDom自动生成C#,还是挺有意思的。

XML Messages.xml 文件

<Messages xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
  <MessagesByType typeCode="1" typeName="Validation Error"> 
    <Message id="110"> 
      <EventId>50</EventId> 
      <Code>110</Code> 
      <ShortText>Invalid value </ShortText> 
      <Description>{0} can't be less than {1}.</Description> 
      <Variables> 
        <Variable formatIndex="0">Field name</Variable> 
        <Variable formatIndex="1">Field name</Variable> 
      </Variables> 
    </Message> 
  </MessagesByType> 
  <MessagesByType typeCode="2" typeName="Application Error"> 
    <Message id="410"> 
      <EventId>50</EventId> 
      <Code>410</Code> 
      <ShortText>Invalid value </ShortText> 
      <Description>{0} can't be less than {1}.</Description> 
      <Variables> 
        <Variable formatIndex="0">Field name</Variable> 
        <Variable formatIndex="1">Field name</Variable> 
      </Variables> 
    </Message> 
  </MessagesByType> 
</Messages>

从上述XML来分析,进而生成以下Data Models,新建一个类名为DataObjXML

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Serialization;

namespace RefactorMessages
{
    [Serializable]
    public class Messages
    {
        // if not setting to "MessagesByType", a parent node of <MessagesByTypes> 
        // beyonds <MessagesByType> will be generated
        [XmlElement("MessagesByType")]
        public List<MessagesByType> MessagesByTypes { get; set; }
    }

    [Serializable]
    public class MessagesByType
    {
        [XmlElement("Message")]
        public List<Message> Messages { get; set; }

        [XmlAttribute]
        public string typeCode { get; set; }
        [XmlAttribute]
        public string typeName { get; set; }
    }

    [Serializable]
    public class Variable
    {
        [XmlAttribute]
        public string formatIndex { get; set; }
        [XmlText]
        public string text { get; set; }
    }

    [Serializable]
    public class Message
    {
        [XmlIgnore]
        [XmlAttribute]
        public string category { get; set; }
        [XmlAttribute]
        public string id { get; set; }
        [XmlIgnore]
        [XmlAttribute]
        public string old_id { get; set; }
        [XmlIgnore]
        public string TypeName { get; set; }
        [XmlIgnore]
        public string TypeCode { get; set; }
        [XmlIgnore]
        public string ClassName { get; set; }
        public string EventId { get; set; }
        public string Code { get; set; }
        public string ShortText { get; set; }
        public string Description { get; set; }
        public List<Variable> Variables { get; set; }
    }
}

在做这个XML数据实体时,要留意到的是,

  1. 之前集合属性MessagesByTypes在没有加[XmlElement("MessagesByType")]之前,反序列化后生成的XML会是<MessagesByTypes><MessagesByType>…</MessagesByType></MessagesByTypes>,这个与我想要的有出入,所以想了这个方法,试了下,没想到能成,现在就能直接成生<Messages..><MessagesByType>..</MessagesByType></Messages>,少了代表List的上层结点。
  2. 如果不希望最后序列化后出现在XML,就需要增加[XmlIgnore]属性
  3. 如果这一个属性需要序列化为属性而非元素,需要增加[XmlAttribute]以作标识
  4. 另外地球人都知道的,所有的类都需要标识为可序列化,使用[Serializable]

虽然基本都是一些小常识,在这做个记录。

有了这个基本数据实体后,就可以开始工作了。

首先,需要将XML导出成我想要的可供大家坐在一起容易分析讨论的。这里面用到了System.xml.Linq,因为Message的typeCode一共可能出现5种,我想先按typeCode的id再按message的id排序再导出,完整代码如下,要注意的是,最后每个值都最好做个空值判断,(其中KeyNameInClass 就是我之后导出为C#类时需要用到的)

private DataTable GenerateDataTableFromXml(bool includeDetect = true)
{
    DataTable dt = null;

    Encoding ec = new UTF8Encoding(true);
    using (StreamReader sr = new StreamReader(_xmlPath, ec))
    {
        XDocument document = XDocument.Load(sr);

        var query = from msg in document.Descendants(@"MessagesByType").Descendants(@"Message")
                    select new
                    {
                        TypeCode = msg.Parent.Attribute("typeCode").Value,
                        TypeName = msg.Parent.Attribute("typeName").Value,
                        MessageId = msg.Attribute("id").Value,
                        EventID = msg.Element("EventID") == null ? string.Empty : msg.Element("EventID").Value,
                        Code = msg.Element("Code") == null ? string.Empty : msg.Element("Code").Value,
                        ShortText = msg.Element("ShortText") == null ? string.Empty : msg.Element("ShortText").Value,
                        Description = msg.Element("Description") == null ? string.Empty : msg.Element("Description").Value,
                        Variables = GetMessageVariables(msg.Element("Variables"), msg.Elements("Variable").ToList()),
                        //the xml key's field name in MessageKey class
                        KeyNameInClass = string.Join("\r\n", _allXMLKeys.Where(o => o.MessageId == msg.Attribute("id").Value).Select(o => o.ClassName).ToList())
                    };

        var list = query.ToList();

        dt = new DataTable();
        dt.Columns.Add("Type");
        dt.Columns.Add("MessageId");
        dt.Columns.Add("EventID");
        dt.Columns.Add("Code");
        dt.Columns.Add("ShortText");
        dt.Columns.Add("Description");
        dt.Columns.Add("Variables");
        dt.Columns.Add("KeyNameInClass");

        foreach (var o in list)
        {
            DataRow dr = dt.NewRow();
            dr["Type"] = o.TypeName;
            dr["MessageId"] = o.MessageId;
            dr["EventID"] = o.EventID;
            dr["Code"] = o.Code;
            dr["ShortText"] = o.ShortText;
            dr["Description"] = o.Description;
            dr["Variables"] = o.Variables;
            dr["KeyNameInClass"] = o.KeyNameInClass;

            dt.Rows.Add(dr);
        }
    }

    return dt;
}

上面的类是为了构建一个需要导出的DataTable,然后导出为csv。

接下来,经过大家的热烈又想死的跑了近2000行的讨论分析、削削减减后,我们得到了一个全新的spreadsheet,并且将最终大家认可的保存为一个名为,let’s say,Finalized的sheet,我现在就需要将这个sheet重新导出成和之前Messages.xml一模一样格式的xml。使用到一般的ado.net,连接串技术,"Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;" 来取得excel数据,并得到之前定义的需要序列化为xml的Messages对象。

private Messages PrepareObjectFromExcel(string excelFilePath, string sheetName)
{
    var fileName = excelFilePath;
    var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName);

    var adapter = new OleDbDataAdapter("SELECT * FROM [" + sheetName + "$]", connectionString);
    var ds = new DataSet();

    adapter.Fill(ds, sheetName);
    var data = ds.Tables[sheetName].AsEnumerable();

    var query = data.Select(x =>
        new Message
        {
            TypeName = x.Field<string>("Type Name"),
            TypeCode = x.Field<string>("Type Code"),
            ClassName = x.Field<string>("Key Name In Class"),
            old_id = x.Field<string>("OldMessageId"),
            category = x.Field<string>("Category"),
            id = x.Field<string>("MessageId"),
            EventId = x.Field<string>("EventID"),
            Code = GetCodeFromMessageId(x.Field<string>("MessageId")),
            ShortText = x.Field<string>("ShortText"),
            Description = x.Field<string>("Description"),
            Variables = ReadVariables(x.Field<string>("Variables")),
        }).ToList<Message>().GroupBy(o => new { o.TypeCode, o.TypeName });

    Messages msg = new Messages();
    var msgsByTypes = new List<MessagesByType>();

    foreach (var type in query)
    {
        MessagesByType msgBody = new MessagesByType
        {
            typeCode = type.Key.TypeCode,
            typeName = type.Key.TypeName,
            Messages = type.ToList()
        };

        msgsByTypes.Add(msgBody);
    }

    msg.MessagesByTypes = msgsByTypes;
}

然后写几行代码序列化一下保存为xml文件即可(以下为完整Serializer.cs,或见源码下载),对了,如果你想得到size更小的xml而不关心formatting的话,settings.Indent = true 是可以置为false的,

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Runtime.Serialization.Formatters.Binary;
using System.IO.Compression;
using System.Xml.Serialization;

namespace RefactorMessages
{
    /// <summary>
    /// from TFS\FOTFP\R2\Common\CBP\HP.FOT.CBP.Shared\Helpers\Serializer.cs
    /// </summary>
    public class Serializer
    {
        const int STRING_MAX_LENGTH = 16 * 1024 + 256;//the length of any string object < 16K

        #region for binaryformatter serialize
        public byte[] SerializeObject(object serializableObject)
        {
            MemoryStream stream = new MemoryStream();
            BinaryFormatter b = new BinaryFormatter();
            b.Serialize(stream, serializableObject);
            byte[] bytes = stream.GetBuffer();
            stream.Close();
            return bytes;

        }

        /// <summary>
        /// 
        /// </summary>
        /// <typeparam name="T">the type of the object need to be serialized</typeparam>
        /// <param name="serializableObject"></param>
        /// <returns></returns>
        public static string SerializeObjectToString<T>(T serializableObject)
        {
            Serializer serializer = new Serializer();
            byte[] bytes = serializer.SerializeObject(serializableObject);

            // SEP this was causing problems with deserialization so I commented it out
            //bytes = Compress(bytes);

            return ToString(bytes);
        }

        #endregion

        #region for binaryformatter deserialize
        public object DeSerializeObject(byte[] someBytes)
        {
            MemoryStream stream = new MemoryStream(someBytes);
            BinaryFormatter b = new BinaryFormatter();
            object retObj = b.Deserialize(stream);
            stream.Close();
            return retObj;
        }

        /// <summary>
        /// 
        /// </summary>
        /// <typeparam name="T">the type of the object got by deserialized</typeparam>
        /// <param name="serializedObjectBytesStr"></param>
        /// <returns></returns>
        public static T DeSerializeObject<T>(string serializedObjectBytesStr)
        {
            T retObj;

            try
            {
                byte[] bytes = ToBytes(serializedObjectBytesStr);

                // SEP this was causing problems with deserialization so I commented it out
                //bytes = Decompress(bytes);

                Serializer serializer = new Serializer();
                retObj = (T)serializer.DeSerializeObject(bytes);
            }
            catch
            {
                retObj = default(T);
            }

            return retObj;
        }

        #endregion

        public static string SerializeObject2XML(object serializableObject)
        {
            if (serializableObject == null) return null;

            return SerializeObject2XML(serializableObject, serializableObject.GetType());
        }

        public static string SerializeObject2XML(object serializableObject, Type type)
        {
            if (serializableObject == null) return null;

            if (type == null) type = serializableObject.GetType();

            System.Xml.XmlWriterSettings settings = new System.Xml.XmlWriterSettings();
            settings.OmitXmlDeclaration = true;
            settings.Indent = true;

            System.Text.StringBuilder builder = new System.Text.StringBuilder();
            System.Xml.XmlWriter xw = System.Xml.XmlWriter.Create(builder, settings);

            System.Xml.Serialization.XmlSerializer x = new System.Xml.Serialization.XmlSerializer(serializableObject.GetType());
            x.Serialize(xw, serializableObject);

            return builder.ToString();
        }

        /// <summary>
        /// Deserializes workflow markup into an T object
        /// </summary>
        /// <param name="xml">string workflow markup to deserialize</param>
        /// <param name="obj">Output T object</param>
        /// <param name="exception">output Exception value if deserialize failed</param>
        /// <returns>true if this XmlSerializer can deserialize the object; otherwise, false</returns>
        public static bool Deserialize<T>(string xml, out T obj, out System.Exception exception)
        {
            exception = null;
            obj = default(T);
            try
            {
                obj = Deserialize<T>(xml);
                return true;
            }
            catch (System.Exception ex)
            {
                exception = ex;
                return false;
            }
        }

        public static bool Deserialize<T>(string xml, out T obj)
        {
            System.Exception exception = null;
            return Deserialize(xml, out obj, out exception);
        }

        public static T Deserialize<T>(string xml)
        {
            System.IO.StringReader stringReader = null;
            try
            {
                XmlSerializer serializer = new XmlSerializer(typeof(T));
                stringReader = new System.IO.StringReader(xml);

                return ((T)(serializer.Deserialize(System.Xml.XmlReader.Create(stringReader))));
            }
            finally
            {
                if ((stringReader != null))
                {
                    stringReader.Dispose();
                }
            }
        }

        #region private method
        private static string ToString(byte[] ms)
        {
            return Convert.ToBase64String(ms);
        }

        private static byte[] ToBytes(string serializedObj)
        {
            return Convert.FromBase64String(serializedObj);
        }

        private static byte[] Compress(byte[] buffer)
        {
            MemoryStream ms = new MemoryStream();
            DeflateStream stream = new DeflateStream(ms, CompressionMode.Compress, true);
            stream.Write(buffer, 0, buffer.Length);
            stream.Close();

            buffer = ms.ToArray();
            ms.Close();

            return buffer;
        }

        private static byte[] Decompress(byte[] buffer)
        {
            MemoryStream ms = new MemoryStream();
            ms.Write(buffer, 0, buffer.Length);
            ms.Position = 0;
            DeflateStream stream = new DeflateStream(ms, CompressionMode.Decompress);
            stream.Flush();

            byte[] decompressBuffer = new byte[STRING_MAX_LENGTH];
            int nSizeIncept = stream.Read(decompressBuffer, 0, STRING_MAX_LENGTH);
            stream.Close();
            ms.Close();

            byte[] lastResult = new byte[nSizeIncept];
            System.Buffer.BlockCopy(decompressBuffer, 0, lastResult, 0, nSizeIncept);

            return lastResult;
        }
        #endregion
    }
}

接下来要提到的是有趣的利用代码来生成c#文件了,这里面还要提到的就是,messageKey之前的程序员们针对Key的业务SCOPE有写大量region,我不愿意扔掉这个,所以为此在spreadsheet中特地增加了一列为category来手动维护,好在MessageID的命名规则是与这个category有关的,所以这个手动维护不是难事,CodeDom支持Directive,即可以在代码前前面面写statement。唯一坑妈的是,不知道是否是出于什么安全性的考虑,CodeDom居然其实是不支持生成static类的,虽然provider有提供SupportPublicStatic(如代码),但其实是没有用的,而且见鬼的也不支持给member加readonly限制符,所以我只得较tricky的将field name写成"readonly xxxx”了,最后唯一美中不足的就是这个类不是我想要的public static class,我须得手动加上static才好,等想到办法再更新这段(目前木有),

这段利用System.CodeDom类生成代码的文件就不贴在这儿了,自行下载找GenerateCodeMessageKeyClass.cs

using System;
using System.Reflection;
using System.IO;
using System.CodeDom;
using System.CodeDom.Compiler;
using Microsoft.CSharp;
using System.Diagnostics;
using System.Linq;

namespace RefactorMessages
{
    /// <summary> 
    /// This code example creates a graph using a CodeCompileUnit and   
    /// generates source code for the graph using the CSharpCodeProvider. 
    /// </summary> 
    public class GenerateCodeMessageKeyClass
    {
        public enum DirectiveType
        {
            None,
            OnlyStart,
            OnlyEnd,
            Both,
        }

        /// <summary> 
        /// Define the compile unit to use for code generation.  
        /// </summary>
        CodeCompileUnit targetUnit;

        /// <summary> 
        /// The only class in the compile unit. This class contains 2 fields, 
        /// 3 properties, a constructor, an entry point, and 1 simple method.  
        /// </summary>
        CodeTypeDeclaration targetClass;

        /// <summary> 
        /// Define the class. 
        /// </summary> 
        public GenerateCodeMessageKeyClass(Messages messages, string tower)
        {
            targetUnit = new CodeCompileUnit();

            CodeNamespace clsMessageKeys = new CodeNamespace("Util.LoggingSupport");

            clsMessageKeys.Imports.Add(new CodeNamespaceImport("System"));
            clsMessageKeys.Imports.Add(new CodeNamespaceImport("Util.Messaging"));

            targetClass = new CodeTypeDeclaration(tower.ToString().ToUpper() + "MessageKeys");
            targetClass.IsClass = true;
            targetClass.TypeAttributes = TypeAttributes.Public;

            clsMessageKeys.Types.Add(targetClass);
            targetUnit.Namespaces.Add(clsMessageKeys);

            foreach (var msg in messages.MessagesByTypes)
            {
                AddSubClass(msg);
            }
        }

        /// <summary> 
        /// Add an entry point to the class. 
        /// </summary> 
        public void AddSubClass(MessagesByType messagesByType)
        {
            CodeTypeDeclaration cls = new CodeTypeDeclaration();
            cls.Name = string.Join("", messagesByType.typeName.Split(' ')) + "Messages";
            cls.IsClass = true;
            cls.TypeAttributes = TypeAttributes.Public;
            cls.Attributes = MemberAttributes.Static;

            string comment;

            switch (messagesByType.typeCode)
            {
                case "1":
                    comment = "Validation Error 200~399";
                    break;

                case "2":
                    comment = "Application Error 400~799";
                    break;

                case "3":
                    comment = "System Error 800~999";
                    break;

                case "4":
                    comment = "Info Messages 001~099";
                    break;

                case "5":
                    comment = "Warn Messages 100~199";
                    break;

                default:
                    comment = string.Empty;
                    break;

            }
            cls.Comments.Add(new CodeCommentStatement(comment));

            var query = messagesByType.Messages.GroupBy(o => o.category);

            foreach (var grp in query)
            {
                int index = 0;

                var list = grp.ToList();
                foreach (var msg in list)
                {
                    DirectiveType directiveType = DirectiveType.None;

                    if (!string.IsNullOrEmpty(grp.Key))
                    {
                        if (list.Count == 1) { directiveType = DirectiveType.Both; }
                        else if (index == 0) { directiveType = DirectiveType.OnlyStart; }
                        else if (index == list.Count - 1) { directiveType = DirectiveType.OnlyEnd; }
                    }

                    AddFields(directiveType, grp.Key, cls, msg);

                    index++;
                }


            }

            targetClass.Members.Add(cls);
        }

        /// <summary>
        /// 
        /// </summary>
        /// <param name="flagEndOrStart">-1 start; 0 content; 1 end; 2 both for one field</param>
        /// <param name="cls"></param>
        /// <param name="msg"></param>
        public void AddFields(DirectiveType directiveType, string categoryName, CodeTypeDeclaration cls, Message msg)
        {
            // Declare the widthValue field.
            CodeMemberField field = new CodeMemberField();
            field.Attributes = MemberAttributes.Public | MemberAttributes.Static;
            field.Name = msg.ClassName;
            field.Type = new CodeTypeReference("readonly MessageKey");

            string messageType;

            switch (msg.TypeCode)
            {
                case "1":
                    messageType = "MessageType.VALIDATION_ERROR";
                    break;

                case "2":
                    messageType = "MessageType.APPLICATION_ERROR";
                    break;

                case "3":
                    messageType = "MessageType.SYSTEM_ERROR";
                    break;

                case "4":
                    messageType = "MessageType.INFO";
                    break;

                case "5":
                    messageType = "MessageType.WARN";
                    break;

                default:
                    messageType = "MessageType.UNASSIGNED";
                    Debug.Assert(false, "need to modify spreadsheet to specify the type");
                    break;
            }

            field.InitExpression = new CodeObjectCreateExpression(
                new CodeTypeReference("MessageKey"),
                new CodeTypeReferenceExpression(messageType),
                new CodePrimitiveExpression(msg.id));

            // Add region
            if (directiveType == DirectiveType.OnlyStart || directiveType == DirectiveType.Both)
            {
                field.StartDirectives.Add(new CodeRegionDirective { RegionMode = CodeRegionMode.Start, RegionText = categoryName });
            }
            if (directiveType == DirectiveType.OnlyEnd || directiveType == DirectiveType.Both)
            {
                field.EndDirectives.Add(new CodeRegionDirective { RegionMode = CodeRegionMode.End, RegionText = categoryName });
            }

            cls.Members.Add(field);
        }

        /// <summary> 
        /// Generate CSharp source code from the compile unit. 
        /// </summary> 
        /// <param name="filename">Output file name</param>
        public void Generate(string fileName)
        {
            CodeDomProvider provider = CodeDomProvider.CreateProvider("CSharp");
            provider.Supports(GeneratorSupport.PublicStaticMembers);

            CodeGeneratorOptions options = new CodeGeneratorOptions();
            options.BracingStyle = "Block";

            using (StreamWriter sourceWriter = new StreamWriter(fileName))
            {
                provider.GenerateCodeFromCompileUnit(
                    targetUnit, sourceWriter, options);
            }
        }
    }
}

最后生成的类会长成这样,using跑到下面去了,当然可以换一种方法来start with,现在是start with namespace,provider还支持GenerateCodeFromExpression,GenerateCodeFromStatement,这样也可以比较"tricky”的先写两行using,再写namespace且不再需要import了。总之CodeDom可以用的场合还有很多,需要用到的时候,google会告诉你的,

namespace LoggingSupport {
    using System;
    using Util.Messaging;
    
    
    public class MessageKeys {
        
        // Validation Error 200~399
        public class ValidationErrorMessages {
            
            #region AM Common
            public static readonly MessageKey ValCannotBeLessThan = new MessageKey(MessageType.VALIDATION_ERROR, "110");
            
            public static readonly MessageKey ValCannotBeGreaterThan = new MessageKey(MessageType.VALIDATION_ERROR, "111");
            #endregion
        }
        
        // Warn Messages 100~199
        public class WarningMessages {
	//...
        }   
}

做这个小东西,其实蛮有意思的,可惜的是,现在需要为这件事重构的子项目已经不多了,所以用处不是特别大,但至少解放了我 XD,如果在最开始做refactor时我这个小工具就在子项目组中运行,可以省掉大家不少手动copy & paste还有玩瞎眼的烦心事了,好在仍然对维护起作用,是有价值的小工具。

之后会考虑将这个console project改成个小winform,不然输入一堆path args,挺麻烦的。记录到此,写这篇博客用掉半小时,值!以后有什么小东西,一定要多记录!

项目源代码下载

BTW,WLW怎么增加tags?

posted @ 2013-04-14 23:16 Elaine Shi 阅读(...) 评论(...) 编辑 收藏