Python爬虫之selenium爬取腾讯招聘

目标：腾讯招聘网

分析

要抓取的内容在大div class="recruit-wrap recruit-margin"下的div里面

xpath定位：

//*[@class="recruit-wrap recruit-margin"]/div

开始

导入库


from selenium import webdriver
from selenium.webdriver.common.by import By

模拟打开网址

dirver = webdriver.Chrome()
dirver.get('https://careers.tencent.com/search.html?index=1&keyword=python')

定位到内容

 div_list = dirver.find_elements(By.XPATH, '//*[@class="recruit-wrap recruit-margin"]/div')

这里说一下，为什么这么用，为什么放着find_element_by_xpath()这么方便的语句不用却用find_elements()？其实，find_element_by_xpath()用不了，在运行的时候会出现警告，让你用find_elements()，如果不用find_elements()就不让你运行，而且你会发现你输入那些命令的时候会多出一条横线，如图：

所以在上面也要多一条

from selenium.webdriver.common.by import By

的命令，就是为了代替find_element_by_xpath()，想要用对应的功能的话，就用

By.XXXX（XXXX是你想要用的功能）

，比如说我想用

find_element_by_name()

的功能，那么我就可以

find_element(By.NAME)

提取内容

one_job_list = div.text.split('\n')
        item = {}
        item['title'] = one_job_info_list[0].strip()
        item['tips'] = one_job_info_list[1].strip()
        item['text'] = one_job_info_list[2].strip()
        print(item)

全部代码


from selenium import webdriver
from selenium.webdriver.common.by import By

dirver = webdriver.Chrome()
dirver.get('https://careers.tencent.com/search.html?index=1&keyword=python')
dirver.find_element_by_link_text()
div_list = dirver.find_elements(By.XPATH, '//*[@class="recruit-wrap recruit-margin"]/div')
for div in div_list:
    # print(div_list)
    one_job_list = div.text.split('\n')
    item = {}
    item['title'] = one_job_info_list[0].strip()
    item['tips'] = one_job_info_list[1].strip()
    item['text'] = one_job_info_list[2].strip()
    print(item)

下面就是每一页都爬

分析

可以看到，url变的只有index=后面的值，那么

for page in range(1, 10):
    url = 'https://careers.tencent.com/search.html?index={}&keyword=python'.fomat(page)
    response = requests.get(url=url)
    response.encoding = response.apparent_encoding
    html = response.text
    print(html)

selenium里面只写

for page in range(1, 10):
    url = 'https://careers.tencent.com/search.html?index={}&keyword=python'.fomat(page)

就行。

posted @ 2022-02-12 21:09 冷巷阅读(180) 评论(0) 收藏举报

刷新页面返回顶部

冷巷

Python爬虫之selenium爬取腾讯招聘

开始

公告