collin_pxy

2020年6月29日

Spider_基础总结2_Request+Beautifulsoup解析HTML

摘要：静态网页抓取实例： import requests from bs4 import BeautifulSoup def gettop250(): headers={ 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKi 阅读全文

posted @ 2020-06-29 11:54 collin_pxy 阅读(102) 评论(0) 推荐(0)

Spider_基础总结5--动态网页抓取--元素审查--json--字典

摘要： # 静态网页在浏览器中展示的内容都在HTML的源码中，但主流网页使用 Javascript时，很多内容不出现在HTML的源代码中，此时仍然使用 # requests+beautifulsoup是不能够成功的，如： # 动态网页的爬取，使用 requests+beautifulsoup是不会成功的：阅读全文

posted @ 2020-06-29 11:34 collin_pxy 阅读(362) 评论(0) 推荐(0)

2020年6月26日

Spider_基础总结7_爬虫基本模板（3个类）

摘要： # 第四章内容--处理不同的网站布局： # 我们想在功能类似的网站上抓取类似内容时，往往这些网站的内容可能布局不一样（相同内容的标签可能不同），由于通常我们爬取的网站数量有限， # 我们没有必要去开发比较一套统一的复杂的的算法或机器学习来识别页面上的哪些文字看起来像标题或段落，只需要手动的去检查网页阅读全文

posted @ 2020-06-26 18:06 collin_pxy 阅读(323) 评论(0) 推荐(0)

2020年6月25日

Spider--补充_None_global_urlparse

摘要： # 知识点补充： # 1) None： a = None if a: print("非None") else: print("None") if a is not None: print("非None") else: print("None") # None # None a = '' if a: 阅读全文

posted @ 2020-06-25 22:32 collin_pxy 阅读(84) 评论(0) 推荐(0)

Spider_实践_beautifulsoup静态网页爬取所有网页链接

摘要： # 获取百度网站首页上的所有a标签里的 href属性值： # import requests # from bs4 import BeautifulSoup # # html = requests.get('http://en.wikipedia.org/wiki/Kevin_Bacon') # h 阅读全文

posted @ 2020-06-25 17:50 collin_pxy 阅读(783) 评论(0) 推荐(0)

2020年6月23日

Spider_基础总结4_bs.find_all()与正则及lambda表达式

摘要： # beautifulsoup的 find()及find_all()方法，也会经常和正则表达式以及 Lambda表达式结合在一起使用： # 1-bs.find_all()与正则表达式的应用： # 语法如示例： # 查找符合条件的所有图片： import requests from bs4 impor 阅读全文

posted @ 2020-06-23 16:20 collin_pxy 阅读(978) 评论(0) 推荐(0)

2020年6月22日

Spider_基础总结3_BeautifulSoup对象+find()+find_all()

摘要： # 本节内容： # 解析复杂的 HTML网页： # 1--bs.find() bs.find_all() tag.get_text() # find_all(tag/tag_list,attributes_dict,recursive,text,limit,keywords) # find(tag/ 阅读全文

posted @ 2020-06-22 20:35 collin_pxy 阅读(220) 评论(0) 推荐(0)

Spider_基础总结2_Requests异常

摘要： # 1: BeautifulSoup的基本使用: import requests from bs4 import BeautifulSoup html=requests.get('https://www.pythonscraping.com/pages/page1.html') bs=Beautif 阅读全文

posted @ 2020-06-22 14:49 collin_pxy 阅读(188) 评论(0) 推荐(0)

2020年6月17日

Python--安装 PyQt5, pyqt5-tools

摘要： # 使用豆瓣镜像源 anaconda prompt界面里输入： pip install pyqt5-tools -i https://pypi.douban.com/simple/ 阅读全文

posted @ 2020-06-17 17:59 collin_pxy 阅读(2064) 评论(0) 推荐(0)

ico图标生成--在线工具

摘要： http://www.ico51.cn/ 阅读全文

posted @ 2020-06-17 11:23 collin_pxy 阅读(287) 评论(0) 推荐(0)

公告