Loading

python快速提取edu SRC页面检索信息

方便快速提取edu SRC页面信息,利于SRC信息搜集,以及生成漏洞报告

edu SRC页面信息

提取信息

代码:

      import requests,time
      from lxml import etree

      def edu_list(page):
          for page in range(1,page+1):
              try:
                  url='https://src.sjtu.edu.cn/list/?page='+str(page)
                  data=requests.get(url).content
                  #print(data)
                  soup = etree.HTML(data.decode('utf-8'))
                  result = soup.xpath('//td[@class=""]/a/text()')
                  #print(result)


                  results = '\n'.join(result)
                  resultss=results.split()
                  print(resultss)
                  for edu in resultss:
                      with open(r'src.txt', 'a+',encoding='utf-8') as f:
                          f.write(edu+'\n')
                          f.close()
              except Exception as e:
                  time.sleep(0.5)
                  pass

      if __name__ == '__main__':
          edu_list(10)

posted @ 2021-04-16 03:28  九~月  阅读(96)  评论(0编辑  收藏  举报