python爬虫 - 随笔分类 - zou-ting-rong

css选择器

摘要：选择器示例示例说明 .class .intro 选择所有class="intro"的元素 #id #firstname 选择所有id="firstname"的元素 * * 选择所有元素 element p 选择所有<p>元素 element,element div,p 选择所有<div>元素和< 阅读全文

posted @ 2020-10-18 14:29 zou-ting-rong 阅读(96) 评论(0) 推荐(0)

Ajax数据爬取－－爬取微博

摘要：Ajax Ajax,即异步的JaveScript和XML。它不是一门编程语言，而是利用JaveScript在保证页面不被刷新，页面链接不改变的情况下与服务器交换数据并更新部分网页的技术。对于传统的网页，如果想要更新内容，那么必须要刷新整个页面，但有了Ajax，便可以在页面不被刷新的基础上更新其内容阅读全文

posted @ 2020-10-17 22:27 zou-ting-rong 阅读(402) 评论(0) 推荐(0)

BeautifulSoup爬取微博热搜榜

摘要：获取url 设定请求头 requests发出get请求实例化BeautifulSoup对象 BeautifulSoup提取数据 import requests 2 from bs4 import BeautifulSoup 3 4 url = "https://s.weibo.com/top/su 阅读全文

posted @ 2020-10-16 15:41 zou-ting-rong 阅读(505) 评论(0) 推荐(0)

BeautifulSoup解析库

摘要：html =""" 2 <!DOCTYPE html> 3 <html> 4 <head> 5 <meta charset = "utf-8"> 6 <title>this is a Demo</title> 7 </head> 8 <body> 9 <div id = "container"> 1 阅读全文

posted @ 2020-10-16 14:48 zou-ting-rong 阅读(177) 评论(0) 推荐(0)

HTML标签解读

摘要：因为最近在学习爬虫，那么在爬取网页内容时，就要求我们能够简单的看懂这个网页的基本结构，才能更好的去爬取我们所需要的内容。这篇随笔也只是简单的说明了一些标签的含义。标签关系包含关系 eg:<head> <title></title> <head> 并列关系 <head></head> <body 阅读全文

posted @ 2020-10-15 19:14 zou-ting-rong 阅读(176) 评论(0) 推荐(0)

爬取微博热搜榜

摘要：爬取过程分析导入requests和etree模块 url设定设置请求头获取html字符串 etree解析 XPath提取数据打印所提取的数据 import requests 2 from lxml import etree 3 4 #url设定 5 url = "https://s.weib 阅读全文

posted @ 2020-10-14 15:12 zou-ting-rong 阅读(86) 评论(0) 推荐(0)

解析库--XPath

摘要：from lxml import etree 2 text = ''' 3 <div> 4 <ul> 5 <li class = "item-0"><a herf = "link1.html">first item</a></li> 6 <li class = "item-1"><a herf = 阅读全文

posted @ 2020-10-13 16:41 zou-ting-rong 阅读(141) 评论(0) 推荐(0)

python－－requests模块详解

摘要：GET请求首先构造一个最简单的get请求，请求的链接为http://httpbin.org/get import requests 2 r = requests.get("http://httpbin.org/get") 3 print(r.text) #运行结果 { "args": {}, "h 阅读全文

posted @ 2020-09-27 16:51 zou-ting-rong 阅读(199) 评论(0) 推荐(0)

python爬去壁纸网站上的所有壁纸

摘要：import requests as r 2 from bs4 import BeautifulSoup 3 import os 4 base_url = "http://www.win4000.com"#站点 5 theme_base_url = "http://www.win4000.com/z 阅读全文

posted @ 2020-09-27 15:22 zou-ting-rong 阅读(244) 评论(0) 推荐(0)

python爬取三国演义的所有章节储存到本地文件中

摘要：#爬取三国演义的全部章节 2 3 import urllib 4 import urllib.request 5 import urllib.parse 6 from lxml import etree 7 from urllib import error 8 import lxml.html 9 阅读全文

posted @ 2020-09-26 20:11 zou-ting-rong 阅读(1486) 评论(0) 推荐(0)

随笔分类 - python爬虫

公告