﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>博客园-xingd.net-最新评论</title><link>http://www.cnblogs.com/xingd/CommentsRSS.aspx</link><description>.net related techonology</description><language>zh-cn</language><pubDate>Sat, 22 Mar 2008 07:56:23 GMT</pubDate><lastBuildDate>Sat, 22 Mar 2008 07:56:23 GMT</lastBuildDate><generator>cnblogs</generator><item><title>Re:再度提升!.NET脏字过滤算法</title><link>http://www.cnblogs.com/xingd/archive/2010/08/03/1061800.html#1886225</link><dc:creator>木木林</dc:creator><author>木木林</author><pubDate>Tue, 03 Aug 2010 08:52:34 GMT</pubDate><guid>http://www.cnblogs.com/xingd/archive/2010/08/03/1061800.html#1886225</guid><description><![CDATA[@时尚品牌
测试了一下,发现是程序里面有一个小BUG
 string sub = text.Substring(index, j + 1);
[code=java]
 string sub = text.Substring(index, j + 1);
  if (hash.Contains(sub))
  {
     return true;
   }
[/code]
应该为text.SubString(index, j+1+index);
<br><br><div align=right><a style="text-decoration:none;" href="http://www.cnblogs.com/xingd/" target="_blank">木木林</a> 2010-08-03 16:52 <a href="http://www.cnblogs.com/xingd/archive/2010/08/03/1061800.html#1886225#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Re:Minesweeper: GDI+ Line Scratch</title><link>http://www.cnblogs.com/xingd/archive/2010/06/22/1076939.html#1853988</link><dc:creator>FlowerJack</dc:creator><author>FlowerJack</author><pubDate>Tue, 22 Jun 2010 12:09:40 GMT</pubDate><guid>http://www.cnblogs.com/xingd/archive/2010/06/22/1076939.html#1853988</guid><description><![CDATA[[quote]Fireman_duck：还以为啥深奥的冬冬&lt;br&gt;大为不满 [/quote]
严重不满这位朋友的言语。支持楼主<br><br><div align=right><a style="text-decoration:none;" href="http://www.cnblogs.com/xingd/" target="_blank">FlowerJack</a> 2010-06-22 20:09 <a href="http://www.cnblogs.com/xingd/archive/2010/06/22/1076939.html#1853988#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Re:String与StringBuilder</title><link>http://www.cnblogs.com/xingd/archive/2010/06/17/102243.html#1849995</link><dc:creator>codefor</dc:creator><author>codefor</author><pubDate>Thu, 17 Jun 2010 14:18:48 GMT</pubDate><guid>http://www.cnblogs.com/xingd/archive/2010/06/17/102243.html#1849995</guid><description><![CDATA[stringbuilder重置length后是不是那块内存就释放了还是还留在那里啊？<br><br><div align=right><a style="text-decoration:none;" href="http://www.cnblogs.com/xingd/" target="_blank">codefor</a> 2010-06-17 22:18 <a href="http://www.cnblogs.com/xingd/archive/2010/06/17/102243.html#1849995#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Re:(再发).NET脏字过滤算法 </title><link>http://www.cnblogs.com/xingd/archive/2009/10/22/1060425.html#1678610</link><dc:creator>哥哥.Net</dc:creator><author>哥哥.Net</author><pubDate>Thu, 22 Oct 2009 07:30:47 GMT</pubDate><guid>http://www.cnblogs.com/xingd/archive/2009/10/22/1060425.html#1678610</guid><description><![CDATA[这个思路很好，今天正好用到。有一点小小的问题：

mark 和 Mark 被认为是两个不同的badword了。不知道是不是我的用法有问题？<br><br><div align=right><a style="text-decoration:none;" href="http://www.cnblogs.com/xingd/" target="_blank">哥哥.Net</a> 2009-10-22 15:30 <a href="http://www.cnblogs.com/xingd/archive/2009/10/22/1060425.html#1678610#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Re:再度提升!.NET脏字过滤算法</title><link>http://www.cnblogs.com/xingd/archive/2009/07/16/1061800.html#1586608</link><dc:creator>xiangxiang</dc:creator><author>xiangxiang</author><pubDate>Thu, 16 Jul 2009 07:37:40 GMT</pubDate><guid>http://www.cnblogs.com/xingd/archive/2009/07/16/1061800.html#1586608</guid><description><![CDATA[@楼主：

说实话，效率很高，但是匹配不稳定，我测试过了

测试环境：

1000个字符串，单个长度大概100K左右，用1340个关键字过滤

用indexof，大概耗时39671.875mm
用你的方法，耗时62.5mm

速度提升500多倍，可惜，好多关键字都漏掉了，用indexof，一篇文章可以匹配出22个关键词的，用你的方法只能匹配出18个，而且另外4个确实都是包含在字符串里面，唉，如果匹配不精确，速度再快对我来也没意义，恕我驽钝，你写的东西我确实看不懂，想改也无从下手，所以，这么好的一个东西，没法用，实在是难受<br><br><div align=right><a style="text-decoration:none;" href="http://www.cnblogs.com/xingd/" target="_blank">xiangxiang</a> 2009-07-16 15:37 <a href="http://www.cnblogs.com/xingd/archive/2009/07/16/1061800.html#1586608#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>re: 再度提升!.NET脏字过滤算法</title><link>http://www.cnblogs.com/xingd/archive/2009/06/23/1061800.html#1566409</link><dc:creator>wulao</dc:creator><author>wulao</author><pubDate>Tue, 23 Jun 2009 09:36:20 GMT</pubDate><guid>http://www.cnblogs.com/xingd/archive/2009/06/23/1061800.html#1566409</guid><description><![CDATA[希望能得到楼上的指教<br><br><div align=right><a style="text-decoration:none;" href="http://www.cnblogs.com/xingd/" target="_blank">wulao</a> 2009-06-23 17:36 <a href="http://www.cnblogs.com/xingd/archive/2009/06/23/1061800.html#1566409#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>re: 再度提升!.NET脏字过滤算法</title><link>http://www.cnblogs.com/xingd/archive/2009/06/23/1061800.html#1566406</link><dc:creator>wulao</dc:creator><author>wulao</author><pubDate>Tue, 23 Jun 2009 09:35:25 GMT</pubDate><guid>http://www.cnblogs.com/xingd/archive/2009/06/23/1061800.html#1566406</guid><description><![CDATA[用<a href="http://www.cnblogs.com/xingd/archive/2008/01/31/1060425.html这个算法替换敏感词汇" target="_new" rel="nofollow">http://www.cnblogs.com/xingd/archive/2008/01/31/1060425.html这个算法替换敏感词汇</a>，可以全部把我的脏字典中的敏感词汇全部命中，但这个方法怎么后面一半的脏字就不能替换呢？？<br/>奇怪了<br/>HashSet&lt;string&gt; hash = new HashSet&lt;string&gt;();<br/>            byte[] fastCheck = new byte[char.MaxValue];<br/>            byte[] fastLength = new byte[char.MaxValue];<br/>            BitArray charCheck = new BitArray(char.MaxValue);<br/>            BitArray endCheck = new BitArray(char.MaxValue);<br/>            int maxWordLength = 0;<br/>            int minWordLength = int.MaxValue;<br/><br/><br/>            string wordPath = ConfigurationManager.AppSettings[&quot;badWords&quot;];<br/>            string badWordTxtPath = System.Web.HttpContext.Current.Server.MapPath(wordPath);<br/>            string[] badWords = null;<br/>            if (System.IO.File.Exists(badWordTxtPath))<br/>            {<br/>                StreamReader sr = new StreamReader(badWordTxtPath, Encoding.Default);<br/>                badWords = sr.ReadToEnd().Split('|');<br/>                //初始化脏字典<br/>                foreach (string word in badWords)<br/>                {<br/>                    maxWordLength = Math.Max(maxWordLength, word.Length);<br/>                    minWordLength = Math.Min(minWordLength, word.Length);<br/><br/>                    for (int i = 0; i &lt; 7 &amp;&amp; i &lt; word.Length; i++)<br/>                    {<br/>                        fastCheck[word[i]] |= (byte)(1 &lt;&lt; i);<br/>                    }<br/><br/>                    for (int i = 7; i &lt; word.Length; i++)<br/>                    {<br/>                        fastCheck[word[i]] |= 0x80;<br/>                    }<br/><br/>                    if (word.Length == 1)<br/>                    {<br/>                        charCheck[word[0]] = true;<br/>                    }<br/>                    else<br/>                    {<br/>                        fastLength[word[0]] |= (byte)(1 &lt;&lt; (Math.Min(7, word.Length - 2)));<br/>                        endCheck[word[word.Length - 1]] = true;<br/><br/>                        hash.Add(word);<br/>                    }<br/><br/>                }<br/>                //判断脏字是否出现在一个字符串中<br/>                int index = 0;<br/>                while (index &lt; strContent.Length)<br/>                {<br/>                    int count = 1;<br/><br/>                    if (index &gt; 0 || (fastCheck[strContent[index]] &amp; 1) == 0)<br/>                    {<br/>                        while (index &lt; strContent.Length - 1 &amp;&amp; (fastCheck[strContent[++index]] &amp; 1) == 0) ;<br/>                    }<br/><br/>                    char begin = strContent[index];<br/><br/>                    if (minWordLength == 1 &amp;&amp; charCheck[begin])<br/>                    {<br/>                        break ;<br/>                    }<br/><br/>                    for (int j = 1; j &lt;= Math.Min(maxWordLength, strContent.Length - index - 1); j++)<br/>                    {<br/>                        char current = strContent[index + j];<br/><br/>                        if ((fastCheck[current] &amp; 1) == 0)<br/>                        {<br/>                            ++count;<br/>                        }<br/><br/>                        if ((fastCheck[current] &amp; (1 &lt;&lt; Math.Min(j, 7))) == 0)<br/>                        {<br/>                            break;<br/>                        }<br/><br/>                        if (j + 1 &gt;= minWordLength)<br/>                        {<br/>                            if ((fastLength[begin] &amp; (1 &lt;&lt; Math.Min(j - 1, 7))) &gt; 0 &amp;&amp; endCheck[current])<br/>                            {<br/>                                string sub = strContent.Substring(index, j + 1);<br/><br/>                                if (hash.Contains(sub) )<br/>                                {<br/>                                    strContent = strContent.Replace(sub, &quot;**敏感词汇已替换**&quot;);<br/>                                    break;<br/>                                }<br/>                            }<br/>                        }<br/>                    }<br/><br/>                    index += count;<br/>                }<br/><br/>                return strContent;<br/><br/><br/>            }<br/>            else<br/>            {<br/>                return &quot;脏字典文件不存在！&quot;;<br/>                <br/>            }<br><br><div align=right><a style="text-decoration:none;" href="http://www.cnblogs.com/xingd/" target="_blank">wulao</a> 2009-06-23 17:35 <a href="http://www.cnblogs.com/xingd/archive/2009/06/23/1061800.html#1566406#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>re: (重发).NET脏字过滤算法 </title><link>http://www.cnblogs.com/xingd/archive/2009/05/30/1050443.html#1541956</link><dc:creator>jay tian</dc:creator><author>jay tian</author><pubDate>Sat, 30 May 2009 11:32:07 GMT</pubDate><guid>http://www.cnblogs.com/xingd/archive/2009/05/30/1050443.html#1541956</guid><description><![CDATA[感谢楼主，把自己的算法分享！<br><br><div align=right><a style="text-decoration:none;" href="http://www.cnblogs.com/xingd/" target="_blank">jay tian</a> 2009-05-30 19:32 <a href="http://www.cnblogs.com/xingd/archive/2009/05/30/1050443.html#1541956#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>re: (再发).NET脏字过滤算法 </title><link>http://www.cnblogs.com/xingd/archive/2009/05/15/1060425.html#1529103</link><dc:creator>ffesp</dc:creator><author>ffesp</author><pubDate>Fri, 15 May 2009 09:27:17 GMT</pubDate><guid>http://www.cnblogs.com/xingd/archive/2009/05/15/1060425.html#1529103</guid><description><![CDATA[|= (byte)(1 &lt;&lt; i);<br/>这代码是什么意思,看的头晕<br/>&quot;|=&quot;是什么,google都搜索不到, (1&lt;&lt;i)又是什么意思<br><br><div align=right><a style="text-decoration:none;" href="http://www.cnblogs.com/xingd/" target="_blank">ffesp</a> 2009-05-15 17:27 <a href="http://www.cnblogs.com/xingd/archive/2009/05/15/1060425.html#1529103#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>re: 学习人物徐敬德：不上大学念自考的“怪”学生</title><link>http://www.cnblogs.com/xingd/archive/2009/05/15/107683.html#1528706</link><dc:creator>DotNet编程</dc:creator><author>DotNet编程</author><pubDate>Fri, 15 May 2009 04:41:31 GMT</pubDate><guid>http://www.cnblogs.com/xingd/archive/2009/05/15/107683.html#1528706</guid><description><![CDATA[佩服你的创业精神！<br/><br/>不过我觉得你要能把工作和大学学习两者都兼顾一下该多好。<br/>也省得现在去参加自学考试了。<br/><br/>顺祝你一切顺利!<br><br><div align=right><a style="text-decoration:none;" href="http://www.cnblogs.com/xingd/" target="_blank">DotNet编程</a> 2009-05-15 12:41 <a href="http://www.cnblogs.com/xingd/archive/2009/05/15/107683.html#1528706#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>
