green168

博客园 首页 新随笔 联系 订阅 管理

在v1.2版本webinfo搜索引擎网络蜘蛛程序中对搜索到的重复页面一直没有很好的办法,在1.3版中我尝试在添加新数据时自动将数据库中原有的相同网页地址的数据删除,高手帮帮忙看看下面这段代码有没有问题。

                              string text5 = row2["url"].ToString().Replace("'", "''");
                              textArray2
= new string[5] { "select count(*) from ", sArray[i], " where url='", text5,"'" } ;
                              text4
= string.Concat(textArray2);   
                              command1.CommandText
= text4;
                             
int count = (int)  command1.ExecuteScalar();
                              
command1.ExecuteNonQuery();
                              
if (count != 0)
                             
{
                              textArray3
= new string[5] { "delete from ", sArray[i], " where url='",text5 ,"'"} ;
                              text3
= string.Concat(textArray3);
                              Console.WriteLine(count
+"重复页面已删除,本次存储完成:"+wghtot);   
                             
//Console.WriteLine(count);
                              command1.CommandText = text3;
                              command1.ExecuteNonQuery();
                              }

                             
else
                             
{
                              Console.WriteLine(
"新数据:"+count);   
                              }

这段代码时可以正常运行的,但是给mssql数据服务器增加了很的负荷,如果搜索数据量非常大很容易造成服务器当机。   

 

 

posted on 2005-01-09 11:41  蛟龙博客  阅读(1745)  评论(5)    收藏  举报