代码改变世界

抓取登录后的数据

2015-11-29 23:01 stoneniqiu 阅读(...) 评论(...) 编辑 收藏

    这次是应一个客户需要,抓取另外一个网站的数据,包括数据提交。这些操作需要在登录之后完成。技术上没有什么难点。关键都是用fiddler找到参数和url。

记住登录状态

    HttpClient能够记住登录状态的,登录完了可以讲Httpclient保存起来。

 private HttpClient _client;
        public HttpClient HttpClient
        {
            get
            {
                if (_client == null)
                {
                    if (Session["Client"]!= null)
                    {
                        _client = Session["Client"] as HttpClient;
                    }
                    else
                    {
                        var handler = new HttpClientHandler
                        {
                            AutomaticDecompression = DecompressionMethods.GZip,
                            UseCookies = true,
                            Proxy =
                                new WebProxy("http://ip:8080/", true, null,
                                    new NetworkCredential("username", "pwd", "domain"))
                        };//代理 
                        _client = new HttpClient(handler);
                        _client.DefaultRequestHeaders.Add("user-agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36");
                        ClientLogin(new ClientLogoModel());
                        Session["Client"] = _client;
                    }
                  
                }

                return _client;
            }
        }

 因为目标网站都是用的json传的参数。也是用json返回的参数。不是form提交的格式。所以post之前也要将参数转成json。

 public object ClientLogin(ClientLogoModel logoModel)
        {
            if (logoModel == null)
            {
                logoModel=new ClientLogoModel();

            }
            var data = JsonConvert.SerializeObject(logoModel); ;
            var logoParams = new List<KeyValuePair<string, string>>();
            logoParams.Add(new KeyValuePair<string, string>("data", data));
            var response = _client.PostAsync(new Uri(LogonUrl), new FormUrlEncodedContent(logoParams)).Result;
            var result = response.Content.ReadAsStringAsync().Result;
            return result;
        }

返回数据转化

从Fiddler左边获得Url,右边TextView上方是参数格式,下方是返回的数据格式。 

每次都要转换,写成泛型。

    public T GetTList<T>(object obj, string url)
        {
            var data = JsonConvert.SerializeObject(obj); ;
            var paramList = new List<KeyValuePair<String, String>> { new KeyValuePair<string, string>("data", data) };
            var response = HttpClient.PostAsync(new Uri(url), new FormUrlEncodedContent(paramList)).Result;

            var result = response.Content.ReadAsStringAsync().Result;
            return JsonConvert.DeserializeObject<T>(result);
        }

调用:

    public ActionResult TradePage(TradeQueryParm param)
        {
       var data = GetTList<TradeRequstResult>(obj, tradeListUrl);
            return PartialView(data);
        }

 前端再将参数传递过来。

  $.post("/Trade/TradePage", {
                    agentName: agentName, shortName: shortName,
                    startDate: startDate, endDate: endDate, page: cpage
                }, function (data) {
                    $("#mtable").html(data);
  }
HttpClient 上传图片:
   private string UploadImage(string fileName,string path)
        {
            FileStream aFile = new FileStream(path, FileMode.Open);
            MultipartFormDataContent form = new MultipartFormDataContent();
            var  content = new StreamContent(aFile);
            content.Headers.ContentType = new MediaTypeHeaderValue("image/jpeg");
            content.Headers.ContentDisposition = new ContentDispositionHeaderValue("form-data")
            {
                Name = "protocolFile",
                FileName = fileName
            };
            form.Add(content);
            var response = HttpClient.PostAsync(imgLoadUrl, form).Result;
            return response.Content.ReadAsStringAsync().Result;
        }