latex 转 mathType 样式

latex 转 mathType

 

主要目的:

  • 分享在开发场景下如何将latex公式转换为mathType公式,并最终呈现到word中
  • 期待大家更好的建议

 

背景:

  • web端呈现理科试题的方式通常是结构化题文数据+latex或html+latex的方式,如果需要将此类试题下载到word中且要求公式样式的完美呈现,目前mathType格式是个比较好的选择
  • 在word上公式的显示效果和易编辑性 mathType >  omml > 图片
  • 不是所有 latex 都可以完美的转换成 mathType 格式,所以还需要 latex 转 omml 作为一些无法转换mathType格式的降级方案

 

实现方案简介:

通过MathTypeSDK提供的功能加以改造可以实现latex转mathType的功能,目前找到的sdk是一个C#的代码,需要运行在windows服务器,因为sdk中调用了com接口只能单线程运行,所以需要多部署几台Windows服务器,此 latex转mathType的服务通过rocketMQ进行数据处理交互,它的主要逻辑如下

消息体:

{

   "latex":"2^a",   

   "mml":""  //部分特殊公式需要提前转换为mml(mathml)格式

}

返回结果消息:

{

  "latex":"2^a",

 "ole":"",

 "wmf":""

}

mathType 格式的公式有两部分组成: ole:对象嵌入链接(决定公式可编辑回显),wmf:矢量图片

 

 

 

关键代码:

		public static MathTypeModel GetOLEAndWMFFromOneWord(String latex) {
			try {
				Object Nothing = Missing.Value; 
               //拿到MTEF(中间态格式)
				byte[] m_bMTEF = GetMTEFBytesFromLatex(latex);
               //打开临时承载mathtype的word文件
				wordDocGlobal = wordAppGlobal.Documents.Open(ref path, ref Nothing, ref Nothing, ref Nothing, ref Nothing, ref Nothing, ref Nothing, ref Nothing, ref Nothing, ref Nothing, ref Nothing, ref Nothing, ref Nothing, ref Nothing, ref Nothing, ref Nothing);
               //将MTEF转换为mathType格式并通过剪切板的方式写入到临时word里
DealWordFile(m_bMTEF); wordDocGlobal.Close(ref Nothing, ref Nothing, ref Nothing); //从临时word读取mathType的ole和wmf MathTypeModel mathType = readMathTypeDontDelete(); Console.WriteLine(latex); return mathType; } catch (Exception e) { Console.WriteLine(e.Message); DeInitWord(); Restart(); } return null; } public static byte[] GetMTEFBytesFromLatex(String latex) { MTSDK m_sdk = new MTSDK(); if (!m_sdk.Init()) return null; IDataObject dataObject = MathTypeSDK.getIDataObject(); if (dataObject == null) { m_sdk.DeInit(); return null; } FORMATETC formatEtc = new FORMATETC(); STGMEDIUM stgMedium = new STGMEDIUM(); try { // Setup the formatting information to use for the conversion. formatEtc.cfFormat = (Int16)DataFormats.GetFormat("TeX Input Language").Id; formatEtc.dwAspect = DVASPECT.DVASPECT_CONTENT; formatEtc.lindex = -1; formatEtc.ptd = (IntPtr)0; formatEtc.tymed = TYMED.TYMED_HGLOBAL; // Setup the MathML content to convert stgMedium.unionmember = Marshal.StringToHGlobalAuto(latex); stgMedium.tymed = TYMED.TYMED_HGLOBAL; stgMedium.pUnkForRelease = 0; // Perform the conversion dataObject.SetData(ref formatEtc, ref stgMedium, false); // Set the format for the output formatEtc.cfFormat = (Int16)DataFormats.GetFormat("MathType EF").Id; //formatEtc.cfFormat = (Int16)DataFormats.GetFormat("Embed Source").Id; formatEtc.dwAspect = DVASPECT.DVASPECT_CONTENT; formatEtc.lindex = -1; formatEtc.ptd = (IntPtr)0; formatEtc.tymed = TYMED.TYMED_ISTORAGE; // Create a blank data structure to hold the converted result. stgMedium = new STGMEDIUM(); stgMedium.tymed = TYMED.TYMED_NULL; stgMedium.pUnkForRelease = 0; // Get the conversion result in MTEF format dataObject.GetData(ref formatEtc, out stgMedium); } catch (COMException e) { Console.WriteLine("COMException:"+e.Message); ReleaseComObject(dataObject); return null; } // The pointer now becomes a Handle reference. HandleRef handleRef = new HandleRef(null, stgMedium.unionmember); try { // Lock in the handle to get the pointer to the data IntPtr ptrToHandle = MathTypeSDK.GlobalLock(handleRef); // Get the size of the memory block int m_iMTEF_Length = MathTypeSDK.GlobalSize(handleRef); // New an array of bytes and Marshal the data across. byte[] m_bMTEF = new byte[m_iMTEF_Length]; Marshal.Copy(ptrToHandle, m_bMTEF, 0, m_iMTEF_Length); return m_bMTEF; } catch (Exception e) { Console.WriteLine("Exception:" + e.Message); } finally { MathTypeSDK.GlobalUnlock(handleRef); m_sdk.DeInit(); ReleaseComObject(dataObject); } return null; } public static Boolean GetWMFBase64FromClipboard(byte[] m_bMTEF) { if (m_bMTEF == null || m_bMTEF.Length < 1) { return false; } MTSDK m_sdk = new MTSDK(); try { short int_iType = -3; short int_iFormat = 4; short out_iType = -2; short out_iFormat = 6; m_sdk.Init(); Int32 stat = 0; Int32 iBufferLength = 5000; StringBuilder strDest = new StringBuilder(iBufferLength); MTAPI_DIMS dims = new MTAPI_DIMS(); string wmfFilePath = GetDataPath(System.Guid.NewGuid().ToString("N") + ".wmf"); stat = MathTypeSDK.Instance.MTXFormEqnMgn( int_iType, int_iFormat, m_bMTEF, m_bMTEF.Length, out_iType, out_iFormat, strDest, iBufferLength, wmfFilePath, ref dims); // save equation if (stat == MathTypeReturnValue.mtOK) { return true; } else { Restart(); } } catch (Exception e) { Console.WriteLine(e.Message); } finally { m_sdk.DeInit(); //while ((MathTypeSDK.Instance.MTAPIDisconnectMgn()) != 0) ; } return true; } public static MathTypeModel readMathTypeDontDelete() { MathTypeModel model = new MathTypeModel(); XmlDocument xmlDoc = new XmlDocument(); xmlDoc.Load(path.ToString()); XmlNode node = xmlDoc.ChildNodes[2]; XmlNodeList xnl = node.ChildNodes; foreach (XmlNode pkg in xnl) { string pkgname = pkg.Attributes[0].Value; //Console.WriteLine(pkgname); if (pkgname.Contains(".wmf")) { string wmf = pkg.ChildNodes[0].InnerText; model.wmf = wmf.Replace("\r\n", ""); //Console.WriteLine(pkg.ChildNodes[0].InnerText); } if (pkgname.Contains(".bin")) { string ole = pkg.ChildNodes[0].InnerText; model.ole = ole.Replace("\r\n", ""); //Console.WriteLine(pkg.ChildNodes[0].InnerText); } } model.type = "1"; return model; } }

  

获取到处理结果后需要检测mathType公式的正确性(当转换错误时.wmf的图中会有红色字符)

 

此方式是预处理的方式,需要在下载之前就处理完所有试题的公式,将转换后的mathType保存到redis 或 mysql中,数据量会比价大如果在下载响应时间运行的情况可以存储在mysql中,通过latex的MD5做主键

html试题下载成word的方案可以看另一篇博客https://www.cnblogs.com/maoyuwei/p/11637738.html

在word中展示时需要将wmf,ole 的数据分别写入到 /word/media/   和  word/embeddings/,然后将定义的链接rId 写入下面的 rId10 和 rId11的位置就可以渲染出mathType样式

 

 

 

 

latex 转 mathType 相关c#代码:

https://github.com/mao-yuwei/latex-to-mathtype

 

posted on 2022-03-01 17:55  maoooooo  阅读(1125)  评论(0编辑  收藏  举报

导航