python-列表式实战

例子为工作中实际用到的一段python脚本,目的为抓取某购物网站商品详细信息,记录脚本只为一个脚印:

1、访问分析该网站

  a.需求,获取价格,颜色,大小(size),颜色对应的图片,名称,库存,货币单位,其中最难的是找到其中的对应关系,这里均以颜色作为键值

  a.根据经验,网站中JOSN格式的商品详细信息一般要么在HTML文档中,要么就单独发送了一个Ajax请求保存在JS中.再不然就是网站做了 价格,颜色,size单一(一个颜色,一个价格,一个size)

  b.根据分析,该网站JSON格式隐藏在Ajax请求中.例如: http://www.xxx.com/ajax/productDetails.jsp?productCode=23813&xxx=xxx..;根据测试简化,其中只有一个参数(prodCode)是最重要的.

2、分析Ajax返回的内容:

  1 <script>
  2 window.getJcrewNameSpaceLegacy('globalObj.jcrew.browse.fullscreen')
  3 globalObj.jcrew.browse.fullscreen.products = globalObj.jcrew.browse.fullscreen.products || [{
  4     images: []
  5 }]
  6 globalObj.jcrew.browse.fullscreen.products[0].productName = 'Factory heathered sweatshirt sweater'
  7 </script>
  8 <section id="description" class="description">
  9     <span class="item-num">item 09256</span>
 10     <div id="BVRRSummaryContainer"></div>
 11     <div id="variants">
 12         <div class="variant-wrapper">
 13             <div class="float-left">
 14                 <input type="radio" name="variants" data-variant="09256" data-varianturl="https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp" data-index="" value="845524441838050" class="product-details-variants" checked />
 15             </div>
 16             <div class="product-pricing">
 17                 <span class="notranslate"> Regular</span>
 18                 <br />
 19                 <div class="product-pricing-wrapper">
 20                     <span class="text-was">valued at</span>
 21                     <span class=" full-price   price-soldout  notranslate">  $59.50</span>
 22                     <span class="selected-color">
 23                                             your price
 24                                          <span class="selected-color-price notranslate">$29.50</span>
 25                     </span>
 26                 </div>
 27             </div>
 28             <div class="clear"></div>
 29         </div>
 30         <div class="variant-wrapper">
 31             <div class="float-left">
 32                 <input type="radio" name="variants" data-variant="B7151" data-varianturl="https://factory.jcrew.com/mens_special_sizes/tall/sweaters/PRD~B7151/B7151.jsp" data-index="" value="845524441838050" class="product-details-variants" />
 33             </div>
 34             <div class="product-pricing">
 35                 <span class="notranslate"> Tall</span>
 36                 <br />
 37                 <div class="product-pricing-wrapper">
 38                     <span class="text-was">valued at</span>
 39                     <span class="  price-soldout  notranslate">  $64.50</span>
 40                     <span class="selected-color">
 41                                             your price
 42                                          <span class="selected-color-price notranslate">$32.00</span>
 43                     </span>
 44                 </div>
 45             </div>
 46             <div class="clear"></div>
 47         </div>
 48     </div>
 49 </section>
 50 <div class="color-title">Color:
 51     <span class="color-name">
 52                 hthr indigo
 53     </span>
 54 </div>
 55 <div id="priceWrapper0" class="price-wrapper">
 56     <div class="product-detail-price sale-price first-item notranslate">
 57         $29.50
 58     </div>
 59     <section id="color1" class="color-row last-row">
 60         <div class="color-box " data-color="MF3369" data-productcode="09256" data-index="">
 61             <a id="MF3369">
 62                 <img data-imgurl="https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_fs418$" src="https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369_sw?$pdp_sw20$" class="product-detail-images" data-productcode="09256" data-index="" />
 63             </a>
 64         </div>
 65         <script>
 66         globalObj.jcrew.browse.fullscreen.products[0].images.push({
 67             type: 'color',
 68             identifier: 'MF3369',
 69             url: 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_enlarge$',
 70             thumbUrl: 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_tn75$'
 71         })
 72         </script>
 73         <div class="color-box " data-color="GY7314" data-productcode="09256" data-index="">
 74             <a id="GY7314">
 75                 <img data-imgurl="https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_fs418$" src="https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314_sw?$pdp_sw20$" class="product-detail-images" data-productcode="09256" data-index="" />
 76             </a>
 77         </div>
 78         <script>
 79         globalObj.jcrew.browse.fullscreen.products[0].images.push({
 80             type: 'color',
 81             identifier: 'GY7314',
 82             url: 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_enlarge$',
 83             thumbUrl: 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_tn75$'
 84         })
 85         </script>
 86         <div class="color-box selected" data-color="BL8362" data-productcode="09256" data-index="">
 87             <a id="BL8362">
 88                 <img data-imgurl="https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_fs418$" src="https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362_sw?$pdp_sw20$" class="product-detail-images" data-productcode="09256" data-index="" />
 89             </a>
 90         </div>
 91         <script>
 92         globalObj.jcrew.browse.fullscreen.products[0].images.push({
 93             type: 'color',
 94             identifier: 'BL8362',
 95             url: 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_enlarge$',
 96             thumbUrl: 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_tn75$'
 97         })
 98         </script>
 99         <div class="clear"></div>
100     </section>
101     <hr class="last-row">
102 </div>
103 <section id="sizes" class="sizes">
104     <header>
105         <h2>Size:</h2>
106         <span><a class="product-details-sizechart" data-sizechart="0,0" href="javascript:void(0);">size charts</a></span>
107         <div class="clear"></div>
108     </header>
109     <div class="size-box   notranslate" data-size="X-SMALL" data-productcode="09256" data-index="">
110         <a id="X-SMALL">
111             <span>X-SMALL</span>
112         </a>
113     </div>
114     <div class="size-box   notranslate" data-size="SMALL" data-productcode="09256" data-index="">
115         <a id="SMALL">
116             <span>SMALL</span>
117         </a>
118     </div>
119     <div class="size-box   notranslate" data-size="MEDIUM" data-productcode="09256" data-index="">
120         <a id="MEDIUM">
121             <span>MEDIUM</span>
122         </a>
123     </div>
124     <div class="size-box   notranslate" data-size="LARGE" data-productcode="09256" data-index="">
125         <a id="LARGE">
126             <span>LARGE</span>
127         </a>
128     </div>
129     <div class="size-box   notranslate" data-size="X-LARGE" data-productcode="09256" data-index="">
130         <a id="X-LARGE">
131             <span>X-LARGE</span>
132         </a>
133     </div>
134     <div class="size-box   notranslate" data-size="XX-LARGE" data-productcode="09256" data-index="">
135         <a id="XX-LARGE">
136             <span>XX-LARGE</span>
137         </a>
138     </div>
139 </section>
140 <div class="clear"></div>
141 <hr>
142 <section id="quantity" class="quantity">
143     <h2 class="quantity-header">Quantity:</h2>
144     <select id="selectBox" data-index="" class="select-box">
145         <option value="1">1</option>
146         <option value="2">2</option>
147         <option value="3">3</option>
148         <option value="4">4</option>
149         <option value="5">5</option>
150         <option value="6">6</option>
151         <option value="7">7</option>
152         <option value="8">8</option>
153         <option value="9">9</option>
154     </select>
155     <div class="clear"></div>
156 </section>
157 <section id="messaging" class="messaging">
158     <!-- Not showing backordered and final sale message if sku is out of stock   -->
159 </section>
160 <section id="actions" class="actions">
161 </section>
162 <script>
163 var productDetailsJSON = '{"sizeset":[{"colors":[{"skuLongId":1689949373559852,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cove","backordered":false,"preordered":false,"colorlabel":"BL8032","skuInventoryStatus":4},{"skuLongId":1689949373559917,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"colorlabel":"MF3369","skuInventoryStatus":4},{"skuLongId":1689949373559858,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr prospect green","backordered":false,"preordered":false,"colorlabel":"MF3372","skuInventoryStatus":4},{"skuLongId":1689949373253180,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"colorlabel":"GY7314","skuInventoryStatus":1},{"skuLongId":1689949373253179,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":6,"outofstock":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"colorlabel":"BL8362","skuInventoryStatus":1}],"size":"X-SMALL"},{"colors":[{"skuLongId":1689949373559850,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cove","backordered":false,"preordered":false,"colorlabel":"BL8032","skuInventoryStatus":4},{"skuLongId":1689949373246447,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"colorlabel":"GY7314","skuInventoryStatus":1},{"skuLongId":1689949373559857,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr prospect green","backordered":false,"preordered":false,"colorlabel":"MF3372","skuInventoryStatus":4},{"skuLongId":1689949373246443,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"colorlabel":"BL8362","skuInventoryStatus":1},{"skuLongId":1689949373559916,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"colorlabel":"MF3369","skuInventoryStatus":1}],"size":"SMALL"},{"colors":[{"skuLongId":1689949373559919,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr prospect green","backordered":false,"preordered":false,"colorlabel":"MF3372","skuInventoryStatus":4},{"skuLongId":1689949373559915,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"colorlabel":"MF3369","skuInventoryStatus":1},{"skuLongId":1689949373559849,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cove","backordered":false,"preordered":false,"colorlabel":"BL8032","skuInventoryStatus":4},{"skuLongId":1689949373246441,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"colorlabel":"BL8362","skuInventoryStatus":1},{"skuLongId":1689949373246446,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"colorlabel":"GY7314","skuInventoryStatus":1}],"size":"MEDIUM"},{"colors":[{"skuLongId":1689949373559854,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"colorlabel":"MF3369","skuInventoryStatus":1},{"skuLongId":1689949373559918,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr prospect green","backordered":false,"preordered":false,"colorlabel":"MF3372","skuInventoryStatus":4},{"skuLongId":1689949373246440,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"colorlabel":"BL8362","skuInventoryStatus":1},{"skuLongId":1689949373559848,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cove","backordered":false,"preordered":false,"colorlabel":"BL8032","skuInventoryStatus":4},{"skuLongId":1689949373246445,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"colorlabel":"GY7314","skuInventoryStatus":1}],"size":"LARGE"},{"colors":[{"skuLongId":1689949373246448,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"colorlabel":"GY7314","skuInventoryStatus":1},{"skuLongId":1689949373246444,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"colorlabel":"BL8362","skuInventoryStatus":1},{"skuLongId":1689949373559851,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cove","backordered":false,"preordered":false,"colorlabel":"BL8032","skuInventoryStatus":4},{"skuLongId":1689949373559855,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"colorlabel":"MF3369","skuInventoryStatus":1},{"skuLongId":1689949373559920,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr prospect green","backordered":false,"preordered":false,"colorlabel":"MF3372","skuInventoryStatus":4}],"size":"X-LARGE"},{"colors":[{"skuLongId":1689949373559853,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cove","backordered":false,"preordered":false,"colorlabel":"BL8032","skuInventoryStatus":4},{"skuLongId":1689949373253823,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"colorlabel":"GY7314","skuInventoryStatus":1},{"skuLongId":1689949373253822,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"colorlabel":"BL8362","skuInventoryStatus":1},{"skuLongId":1689949373559856,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":5,"outofstock":false,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"colorlabel":"MF3369","skuInventoryStatus":1},{"skuLongId":1689949373559921,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr prospect green","backordered":false,"preordered":false,"colorlabel":"MF3372","skuInventoryStatus":4}],"size":"XX-LARGE"}],"colorset":[{"sizes":[{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"MEDIUM","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"SMALL","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"X-LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":6,"outofstock":false,"sizelabel":"X-SMALL","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"XX-LARGE","backordered":false,"preordered":false}],"color":"BL8362","fullydomqty":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"finalsale":false},{"sizes":[{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"MEDIUM","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"SMALL","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"X-LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"X-SMALL","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"XX-LARGE","backordered":false,"preordered":false}],"color":"GY7314","fullydomqty":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"finalsale":false},{"sizes":[{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"X-LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":5,"outofstock":false,"sizelabel":"XX-LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"MEDIUM","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"SMALL","backordered":false,"preordered":false}],"color":"MF3369","fullydomqty":false,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"finalsale":false}]}';
164 var imgSelectedColor = 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_fs418$';
165 var wishlistSize = '0';
166 var wishlistIsDefaultList = 'false';
167 var editFlagAjax = 'false';
168 var editWishListFlagAjax = 'false';
169 var isSaleProduct = 'false';
170 var editDomRtlItem = 'false';
171 if(editDomRtlItem === 'true') {
172     $('#monogram' + '').hide();
173 }
174 </script>
xcblogs

 

  在最后的<script>标签中productDetailsJSON 变量就是我们需要的东西了,格式化后就得到了color,size,inventory(库存)之间的关系.

源码在这儿:

  1 #-*- coding: utf-8 -*-
  2 __author__ = ''
  3 
  4 from pyquery import PyQuery;
  5 import sys
  6 import json
  7 import spiderBase
  8 import requests as req;
  9 import re
 10 req.packages.urllib3.disable_warnings()  #不显示https的警告
 11 class drag(spiderBase.spiderBase):
 12 
 13     def __init__(self,url):
 14         self.headers = {
 15         'Referer': 'https://www.jcrew.com/',
 16         'Connection': 'Keep-Alive',
 17         'Accept-Language': 'en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3',
 18         'Accept': 'text/html, application/xhtml+xml, */*',
 19         'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36'
 20         }
 21         self.session = req.session();
 22         back_data = self.session.get(url,headers = self.headers,verify=False)
 23         self.data = PyQuery(back_data.text)
 24         self.status_code = back_data.status_code;
 25         self.retutn_url = back_data.history;
 26         url_list = url.split('/')
 27         self.url_protocol = url_list[0]
 28         self.domain = url_list[0]+'//'+url_list[2]
 29         self.url = url
 30 
 31     def list(self):
 32         data = self.data;
 33         self.urlList = [];
 34         page_num = data(".product-image-wrap").items();
 35         for page_t in page_num:
 36             name = page_t.parent("div").parent("div div.plus_detail_wrap a span:eq(0)").text();
 37             price = page_t.parent("div").parent("div div.plus_detail_wrap a span:eq(1)").text();
 38             self.urlList.append(dict(url=page_t.attr("href"),img=page_t("img").attr("src"),name=name,price=price));
 39         return json.dumps(self.urlList);
 40 
 41     def page(self,gid=0,channel_id=0):
 42         pyhtml = self.data;
 43         #以下为cc修改后
 44         # print pyhtml('script').text()
 45         self.descr = pyhtml('div#prodDtlBody li').text()
 46         #匹配产品code
 47         prodCode = re.search(r'data-productcode="(\d*?)"',str(pyhtml),re.DOTALL).groups()[0]
 48         self.unit = '$' if re.search(r'current_currency=\'(.*?)\'',pyhtml('script').text(),re.DOTALL).groups()[0] == 'USD' else re.search(r'current_currency=\'(.*?)\'',pyhtml('script').text(),re.DOTALL).groups()[0]
 49         #发起产品详情请求
 50         prodDetailsTxt = self.session.get(self.domain+'/browse2/ajax/product_details_ajax.jsp?prodCode=%s' % prodCode,headers = self.headers,verify=False).text
 51         self.name = re.search(r'productName = \'(.*?)\'',prodDetailsTxt,re.DOTALL).groups()[0]
 52         self.brand = 'J.CREW'
 53         #胖瘦:[{'old_price': 'xxx', 'price': 'xxx', 'name': 'xxx'}, {'old_price': 'xxx', 'price': 'xxx', 'name': 'xxx'}]
 54         variants = [ {'name':PyQuery(variant)('span.notranslate').eq(0).text(),'price':PyQuery(variant)('span.selected-color-price').text()[1:],'old_price':PyQuery(variant)('span.price-soldout').text()[1:] if PyQuery(variant)('span.price-soldout').text() != '' else PyQuery(variant)('span.selected-color-price').text()[1:] } for variant in PyQuery(prodDetailsTxt)('#variants').children('div')]
 55         #颜色ID和图片:(官网默认只有一张图片)
 56         coloridImg = dict([(PyQuery(box)('a').attr('id'),PyQuery(box)('img').attr('data-imgurl').replace('$pdp_fs418$','$pdp_enlarge$')) for box in PyQuery(prodDetailsTxt)('div.color-box')])
 57         #匹配库存颜色关系
 58         productDetailsJSON = json.loads(re.search(r'productDetailsJSON = \'(.*?)\';',prodDetailsTxt,re.DOTALL).groups()[0])
 59         map_colorSizeInventory = {}
 60         for color in productDetailsJSON['colorset']:
 61             map_colorSizeInventory[color['color']] = {'colorName':color['colordisplayname'].replace(' ','-'),'sizes':[{'size':sizeInv['sizelabel'].lower(),'inventory':sizeInv['inventory']} for sizeInv in color['sizes']]}
 62         #映射color名称
 63         imgs_tmp = dict([(map_colorSizeInventory[colorid]['colorName'],[img]) for colorid,img in coloridImg.items()])
 64         size_tmp = dict([(cs['colorName'],cs['sizes']) for cs in map_colorSizeInventory.values()])
 65         colors_tmp = imgs_tmp.keys()
 66         #映射price
 67         self.price = dict([(colorName + '_'+ variant['name'],variant['price']) for variant in variants for colorName in colors_tmp ])
 68         self.old_price = dict([(colorName + '_'+ variant['name'],variant['old_price']) for variant in variants for colorName in colors_tmp ])
 69         self.colors = [colorName + '_'+ variant['name'] for variant in variants for colorName in colors_tmp ]
 70 
 71         # self.price = {}
 72         # self.old_price = {}
 73         # self.colors = []
 74         # for variant in variants:
 75         #     for colorName in colors_tmp:
 76         #         self.price[colorName + '_'+ variant['name']] = variant['price']
 77         #         self.old_price[colorName + '_'+ variant['name']] = variant['old_price']
 78         #         self.colors.append(colorName + '_'+ variant['name'])
 79         #在size和imgs中加入variant
 80         self.imgs = dict([(colorName+'_'+variantName,img) for variantName in [ variant['name'] for variant in variants] for colorName,img in imgs_tmp.items()])
 81         self.size = dict([(colorName+'_'+variantName,sizes) for variantName in [ variant['name'] for variant in variants] for colorName,sizes in size_tmp.items()])
 82         # self.imgs = {}
 83         # self.size = {}
 84         # for variantName in [ variant['name'] for variant in variants] :
 85         #     for colorName,img in imgs_tmp.items():
 86         #         self.imgs[colorName+'_'+variantName] = img
 87         #     for colorName,sizes in size_tmp.items():
 88         #         self.size[colorName+'_'+variantName] = sizes
 89         self.channel_id = channel_id
 90         self.designer = ''
 91         self.img = ''
 92         self.returns = ''
 93         print variants
 94         print self.imgs
 95         print self.size
 96         print self.colors
 97         print self.unit
 98         return self.returnData();  #在父类中实现,可以注释掉
 99 
100 if __name__ == '__main__':
101     if len(sys.argv) == 3:
102         action = sys.argv[1]
103         url = sys.argv[2]
104     else:
105         url = 'https://factory.jcrew.com/mens_clothing/wear_to_work.jsp';
106         action = "list";
107         url = 'https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo';
108         action = "page";
109     obj = drag(url)
110     exe = "obj.%s()" % (action)
111     print eval(exe)

 

 

 去掉注释,差不多不到100行就分析完了一个网站,python的列表式真的很强大,^_^

 经过父类的json格式化处理,然后返回的字符格式化后如下:(一共六个产品)

  1 [
  2     {
  3         "designer":"",
  4         "name":"Factory heathered sweatshirt sweater",
  5         "descr":"Cotton. Crafted in two-tone yarns for a heathered effect. Machine wash. Import.",
  6         "price":"29.50",
  7         "img":"https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_enlarge$",
  8         "old_price":"59.50",
  9         "returns":"",
 10         "channel_id":0,
 11         "colors":"hthr-cabernet_Regular",
 12         "link":"https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo",
 13         "imgs":[
 14             "https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_enlarge$"
 15         ],
 16         "brand":"J.CREW",
 17         "unit":"$",
 18         "size":"[{"inventory": 9, "size": "large"}, {"inventory": 9, "size": "x-large"}, {"inventory": 5, "size": "xx-large"}, {"inventory": 9, "size": "medium"}, {"inventory": 9, "size": "small"}]"
 19     },
 20     {
 21         "designer":"",
 22         "name":"Factory heathered sweatshirt sweater",
 23         "descr":"Cotton. Crafted in two-tone yarns for a heathered effect. Machine wash. Import.",
 24         "price":"29.50",
 25         "img":"https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_enlarge$",
 26         "old_price":"59.50",
 27         "returns":"",
 28         "channel_id":0,
 29         "colors":"hthr-ebony_Regular",
 30         "link":"https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo",
 31         "imgs":[
 32             "https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_enlarge$"
 33         ],
 34         "brand":"J.CREW",
 35         "unit":"$",
 36         "size":"[{"inventory": 9, "size": "large"}, {"inventory": 9, "size": "medium"}, {"inventory": 9, "size": "small"}, {"inventory": 9, "size": "x-large"}, {"inventory": 9, "size": "x-small"}, {"inventory": 9, "size": "xx-large"}]"
 37     },
 38     {
 39         "designer":"",
 40         "name":"Factory heathered sweatshirt sweater",
 41         "descr":"Cotton. Crafted in two-tone yarns for a heathered effect. Machine wash. Import.",
 42         "price":"29.50",
 43         "img":"https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_enlarge$",
 44         "old_price":"59.50",
 45         "returns":"",
 46         "channel_id":0,
 47         "colors":"hthr-indigo_Regular",
 48         "link":"https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo",
 49         "imgs":[
 50             "https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_enlarge$"
 51         ],
 52         "brand":"J.CREW",
 53         "unit":"$",
 54         "size":"[{"inventory": 9, "size": "large"}, {"inventory": 9, "size": "medium"}, {"inventory": 9, "size": "small"}, {"inventory": 9, "size": "x-large"}, {"inventory": 6, "size": "x-small"}, {"inventory": 9, "size": "xx-large"}]"
 55     },
 56     {
 57         "designer":"",
 58         "name":"Factory heathered sweatshirt sweater",
 59         "descr":"Cotton. Crafted in two-tone yarns for a heathered effect. Machine wash. Import.",
 60         "price":"32.00",
 61         "img":"https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_enlarge$",
 62         "old_price":"64.50",
 63         "returns":"",
 64         "channel_id":0,
 65         "colors":"hthr-cabernet_Tall",
 66         "link":"https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo",
 67         "imgs":[
 68             "https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_enlarge$"
 69         ],
 70         "brand":"J.CREW",
 71         "unit":"$",
 72         "size":"[{"inventory": 9, "size": "large"}, {"inventory": 9, "size": "x-large"}, {"inventory": 5, "size": "xx-large"}, {"inventory": 9, "size": "medium"}, {"inventory": 9, "size": "small"}]"
 73     },
 74     {
 75         "designer":"",
 76         "name":"Factory heathered sweatshirt sweater",
 77         "descr":"Cotton. Crafted in two-tone yarns for a heathered effect. Machine wash. Import.",
 78         "price":"32.00",
 79         "img":"https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_enlarge$",
 80         "old_price":"64.50",
 81         "returns":"",
 82         "channel_id":0,
 83         "colors":"hthr-ebony_Tall",
 84         "link":"https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo",
 85         "imgs":[
 86             "https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_enlarge$"
 87         ],
 88         "brand":"J.CREW",
 89         "unit":"$",
 90         "size":"[{"inventory": 9, "size": "large"}, {"inventory": 9, "size": "medium"}, {"inventory": 9, "size": "small"}, {"inventory": 9, "size": "x-large"}, {"inventory": 9, "size": "x-small"}, {"inventory": 9, "size": "xx-large"}]"
 91     },
 92     {
 93         "designer":"",
 94         "name":"Factory heathered sweatshirt sweater",
 95         "descr":"Cotton. Crafted in two-tone yarns for a heathered effect. Machine wash. Import.",
 96         "price":"32.00",
 97         "img":"https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_enlarge$",
 98         "old_price":"64.50",
 99         "returns":"",
100         "channel_id":0,
101         "colors":"hthr-indigo_Tall",
102         "link":"https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo",
103         "imgs":[
104             "https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_enlarge$"
105         ],
106         "brand":"J.CREW",
107         "unit":"$",
108         "size":"[{"inventory": 9, "size": "large"}, {"inventory": 9, "size": "medium"}, {"inventory": 9, "size": "small"}, {"inventory": 9, "size": "x-large"}, {"inventory": 6, "size": "x-small"}, {"inventory": 9, "size": "xx-large"}]"
109     }
110 ]

 

 2015年12月16日14:07:59

 

posted @ 2015-12-16 14:09  超超xc  Views(371)  Comments(0Edit  收藏  举报
I suppose,were childrenonec.