抓取网页信息

在做网页抓取方面的事情,有些网页中的信息只有会员登陆后才能看到,考虑到登陆问题,采用webBrowser来获取页面,但在点击链接时有的链接时用target="_blank",因此在webBrowser中点击没有效果。这时可以通过webBrowser的属性来修改target="_blank",让其在webBrowser中显示:

webBrowser1.DocumentText = webBrowser1.DocumentText.Replace(@"target=""_blank""", "");

 下面是通过url获取网页内容:

   /// <summary>
        /// 获取URL内容
        /// </summary>
        /// <param name="string">URL</param>
        /// <returns>string</returns>
        private static string getURL(string URL)
        {
            string responseFromServer = "";

            try
            {
                WebRequest request = WebRequest.Create(URL);//实例化WebRequest对象
                request.Timeout = 60000;//设定1分钟超时
                WebResponse response = request.GetResponse();//创建WebResponse对象
                Stream datastream = response.GetResponseStream();//创建流对象
                StreamReader reader = new StreamReader(datastream, Encoding.Default);
                responseFromServer = reader.ReadToEnd();//读取数据

                datastream.Close();
                response.Close();
                reader.Close();
            }
            catch (Exception ex)
            {
                throw ex;
            }

            return responseFromServer;
        }

posted @ 2012-12-07 16:03  ellens  阅读(206)  评论(0)    收藏  举报