posts - 11,comments - 10,trackbacks - 0

这两天,编码做了一个新蛋网手机信息的采集,web页面信息采集是用WebClient控件。需要调用方法Gather()。希望能有帮助。

代码如下:

/* 
 * Created By ChinaAgan 2012-1-18
 * 
 */
using System;
using System.Collections.Generic;
using System.Text;
using System.Collections;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;

using CnBlogCollector.Properties;

namespace CnBlogCollector
{
    /// <summary>
    /// 数据采集类
    /// </summary>
    public class Collector
    {
        #region 变量
        private string cnblogMain = "http://www.newegg.com.cn/SubCategory/1043-{0}.htm";//cnblog首页地址
        
        private WebClient wc = new WebClient(); 
        #endregion


       #region 创建目录
        /// <summary>
        /// 判断目录是否存在,若不存在则创建该目录
        /// </summary>
        /// <param name="path"></param>
        /// <returns></returns>
        public string CreateFolderIfNot(string path)
        {
            //获取该目录的完整路径
            string rtn = Path.GetFullPath(path);
            //若该目录不存在
            if (!Directory.Exists(rtn))
            {
                //创建该目录
                Directory.CreateDirectory(rtn);
            }
            return rtn;
        }
        #endregion

       #region 采集网页数据
       public void Gather(int startIndex, int endIndex)
       {
           WebProxy webProxy = new WebProxy("proxy.cn1.global.ctrip.com:8080");
           webProxy.Credentials = new System.Net.NetworkCredential("jian_chen", "1qaz2wsx#");
           wc.Proxy = webProxy;

           string outContent = "";
           //根据startIndex和endIndex来遍历cnblog首页上文章
           for (int i = startIndex; i < endIndex; i++)
           {
               //从cnblog首页下载页面数据并将其转换成UTF8编码格式的STRING
               string url = string.Format(cnblogMain, i.ToString());
               string mainData = Encoding.GetEncoding("GB2312").GetString(wc.DownloadData(url)).Replace("\r\n", "");

               string strPattern = @"<p\s+class=""info""><a\s+href=(?<url>.+?)\s+title=""(?<title>.+?)"">(?<content>.+?)</a>";
               string oldPricePattern = @"<p\s+class=""bypast""><span>¥(?<OldPrice>.+?)</span></p>";
               string newPricePattern = @"<p\s+class=""current""><strong\s+class=""price""><span>¥</span>(?<NewPrice>\d+?\..+?)</strong></p>";

               List<string> nameList = new List<string>();
               List<string> oldPriceList = new List<string>();
               List<string> newPriceList = new List<string>();
               string oldPrice = String.Empty;
               string newPrice = String.Empty;

               MatchCollection MatchesName = Regex.Matches(mainData, strPattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
               MatchCollection MatchesOldPrice = Regex.Matches(mainData, oldPricePattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
               MatchCollection MatchesNewPrice = Regex.Matches(mainData, newPricePattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);

               foreach (Match NextMatch in MatchesName)
               {
                   nameList.Add(NextMatch.Groups["content"].Value);
               }

               foreach (Match NextMatch in MatchesOldPrice)
               {
                   oldPriceList.Add(NextMatch.Groups["OldPrice"].Value);
               }

               foreach (Match NextMatch in MatchesNewPrice)
               {
                   newPriceList.Add(NextMatch.Groups["NewPrice"].Value);
               }

               for (int iLen = 0; iLen < nameList.Count; iLen++)
               {
                   outContent += String.Format("手机名称:{0}," + "原价:{1},现价:{2}", nameList[iLen].ToString(), oldPriceList[iLen].ToString(), newPriceList[iLen].ToString()) +"\r\n";
               }

               // 现价和&32;之类符号的处理。
               string pth = CreateFolderIfNot(Settings.Default.OutPath) + i + ".txt";
               if (File.Exists(pth))
               {
                   File.Delete(pth);  
               }

               File.AppendAllText(pth, outContent, Encoding.GetEncoding("GB2312"));

               outContent = "";
           }
       } 
       #endregion
    }
}

 

posted @ 2012-01-18 10:14 chinaagan 阅读(49) 评论(2) 编辑

今天帮一位新同事配置.net环境,报这个错。

Could not load file or assembly 'System.Runtime.Serialization, Version=3.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' or

one of its dependencies. The module was expected to contain an assembly manifest.

google上没有找到原因,郁闷了很久。

后来在博客园找到了给力贴,先查看了IIS的配置,

是2.0.50727版本的,

找到这个版本的相关配置,C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\CONFIG\web.config。

删除web.config配置文件的引用配置项。

<add assembly="System.Runtime.Serialization, Version=3.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089, processorArchitecture=MSIL"/>

页面不报错了。~~

原因:

web.config是一个公用的配置项。在我们的项目中,不引用System.Runtime.Serialization,但是可以使用它,就是公用配置的功劳了。

公用配置调用C:\WINDOWS\assembly的值,如果调用不到,或者调用的版本不正确,就会产生如上的问题。

posted @ 2012-01-11 21:51 chinaagan 阅读(36) 评论(0) 编辑
转载地址:http://blog.csdn.net/zheng_/archive/2010/03/04/5344472.aspx
 
using System;
using System.Collections.Generic;
using System.Text;
using System.Threading;
using System.Reflection;
 

namespace TestAssembly
{
    public class TestClass : ITestInterface
    {
        public TestClass() { }
        public double TestMethod(double param)
        {
            return param * 0.75;
        }
    }
 
    public interface ITestInterface
    {
        double TestMethod(double param);
    }
 
    class Program
    {
        public static void Main(string[] args)
        {
            int n = 1000;
            Test1(n);//直接调用
            Test2(n);//通过InvokeMember调用
            Test3(n);//通过接口调用
            Test4(n);//绑定至delegate
            Console.In.ReadLine();
        }
 

        public delegate double TestDelegate(double param);
 
        //首先,写代码测量直接运行的效率,代码如下:
        static void Test1(int n)
        {
            TestClass tc = new TestClass();
            DateTime startTime = DateTime.Now;
            for (int i = 0; i < n; i++)
                for (int j = 0; j < n; j++)
                    tc.TestMethod(1.0);
            TimeSpan ts = DateTime.Now - startTime;
            Console.Out.WriteLine("Test1: " + ts);
        }
 
        //接着是通过InvokeMember调用,代码如下:
        static void Test2(int n)
        {
            Type testType = typeof(TestClass);
            object obj = testType.InvokeMember(null,BindingFlags.CreateInstance, null, null, null);
            DateTime startTime = DateTime.Now;
            for (int i = 0; i < n; i++)
                for (int j = 0; j < n; j++)
                    testType.InvokeMember("TestMethod", BindingFlags.InvokeMethod, null, obj, new object[] { 1.0 });
            TimeSpan ts = DateTime.Now - startTime;
            Console.Out.WriteLine("Test2: " + ts);
        }
 
        //然后,是将获得的object用接口来引用,然后调用方法,代码如下:
        static void Test3(int n)
        {
            Type testType = typeof(TestClass);
 
            object obj = testType.InvokeMember(null,BindingFlags.CreateInstance, null, null, null);
 
            ITestInterface instance = (ITestInterface)obj;
 
            DateTime startTime = DateTime.Now;
            for (int i = 0; i < n; i++)
                for (int j = 0; j < n; j++)
                    instance.TestMethod(1.0);
            TimeSpan ts = DateTime.Now - startTime;
            Console.Out.WriteLine("Test3: " + ts);
        }
 

        static void Test4(int n)
        {
            Type testType = typeof(TestClass);
            object obj = testType.InvokeMember(null,BindingFlags.CreateInstance, null, null, null);
 
            TestDelegate testMethod = (TestDelegate)Delegate.CreateDelegate(typeof(TestDelegate), obj, "TestMethod");
 
            DateTime startTime = DateTime.Now;
            for (int i = 0; i < n; i++)
                for (int j = 0; j < n; j++)
                    testMethod(1.0);
            TimeSpan ts = DateTime.Now - startTime;
            Console.Out.WriteLine("Test4: " + ts);
        }
    }
}
 
Test1,Test2,Test3,Test4比较效果。
posted @ 2010-09-13 16:26 chinaagan 阅读(64) 评论(0) 编辑
 
引用:http://www.cnblogs.com/qingteng1983/archive/2010/07/25/1784528.html 看看兄弟们在项目中用了几项?
posted @ 2010-08-19 10:42 chinaagan 阅读(5) 评论(0) 编辑
2010-8-19,做了个多线程的案例。有单件模式的多线程(方法:InstanceSingleTon)和计数的多线程(方法:Count)。
代码比较简单,主要是code单件模式,多线程同步,异步的概念。
贴代码如下:
 class Program
    {
        static void Main(string[] args)
        {
 
            Thread[] threads = new Thread[10];
 
            for (int i = 0; i < 10; i++)
            {
                threads[i] = new Thread(new ThreadStart(SingleTon.Count));
            }
 
            foreach (Thread t in threads)
            {
                t.Start();
            }
 
            Console.ReadLine();
        }
    }
 

    public class SingleTon
    {
        static int intIn = 0;
 
        public static SingleTon instance;
        private static object lockHelper = new object();
 
        private SingleTon()
        {
           
        }
 
        /// <summary>
        ///  计数。注释的lock代码是锁对象,实现多线程-同步操作。目的:按顺序显示索引值。1,2,3,4,5,6,7,8,9,10
        /// 现在的代码是多线程-异步操作。
        /// </summary>
        public static void Count()
        {
            //lock (lockHelper)
            //{
            //    Console.WriteLine("Current:" + Convert.ToString(++intIn));
            //}
 
            Console.WriteLine("Current:" + Convert.ToString(++intIn));
          
        }
 
        /// <summary>
        /// 单件模式,防止多线程-同步实例化
        /// </summary>
        /// <returns></returns>
        public static SingleTon InstanceSingleTon()
        {
            if (instance == null)
            {
                lock (lockHelper)
                {
                    if (instance==null)
                    {
                        instance = new SingleTon();
                    }
                }
            }
            return instance;
        }
    }
posted @ 2010-08-19 09:58 chinaagan 阅读(328) 评论(2) 编辑
    该文被密码保护。
posted @ 2010-08-13 09:50 chinaagan 阅读(0) 评论(0) 编辑
    该文被密码保护。
posted @ 2010-08-12 16:18 chinaagan 阅读(5) 评论(0) 编辑
摘要: 前两天去A公司面试,面试管问的题目一下子闷了。很郁闷。重新答题。在这里分享一下。 1) 问:请问你最近做过什么项目,介绍一下大致的情况? 答:做了http://www.chinatravel.net,由2个人一起开发,其中一个美工,另外一位就是俺了。 表设计,coding和架构一个人包了。分为6层架构。如下: 2)在项目的数据缓存中,如果有紧急情况,需要清数据缓存,请问怎么处理?(IIS重启成本...阅读全文
posted @ 2009-09-27 22:48 chinaagan 阅读(458) 评论(5) 编辑
posted @ 2009-02-28 23:45 chinaagan 阅读(643) 评论(0) 编辑
posted @ 2009-02-20 23:16 chinaagan 阅读(258) 评论(1) 编辑