Python:提取html中所有URL链接

第一步:搜索<a>标签

第二步:提取<a>标签中href的内容

以CSDN首页为例,代码如下:

>>> import requests
>>> r=requests.get("https://www.csdn.net")
>>> demo=r.text
>>> from bs4 import BeautifulSoup
>>> soup=BeautifulSoup(demo,"html.parser")
>>> for link in soup.find_all('a'):
	print(link.get('href'))

	
http://www.csdn.net
https://blink.csdn.net/
https://blog.csdn.net/rank/writing_rank
https://www.csdn.net/nav/python
https://www.csdn.net/nav/game
https://www.csdn.net/nav/java
https://www.csdn.net/nav/ops
https://www.csdn.net/nav/arch
https://www.csdn.net/nav/5g
https://ai.csdn.net/
https://www.csdn.net/nav/avi
https://www.csdn.net/nav/mobile
https://blog.csdn.net/weixin_56797974
https://www.csdn.net/nav/career
https://www.csdn.net/nav/sec
https://www.csdn.net/nav/fund
https://ac.csdn.net/
https://www.csdn.net/nav/iot
https://www.csdn.net/nav/db
https://www.csdn.net/nav/web
https://cloud.csdn.net
https://blog.csdn.net/nav/blockchain
https://plugin.csdn.net/
https://so.csdn.net/plugin/jsonPages.html
https://so.csdn.net/plugin/qrcode.html?query=https://csdn.net
https://www.csdn.net/nav/other
None
https://live.csdn.net/
https://huiyi.csdn.net/
https://gitchat.csdn.net/aggregation/columns?utm_source=ColumnsChannel
https://edu.csdn.net/
https://blog.csdn.net/weixin_39786569/article/details/115690916
https://blog.csdn.net/weixin_39786569/article/details/115690916
https://ask.csdn.net/questions/7416273
https://ask.csdn.net/questions/7416273
https://blink.csdn.net/details/1186592
https://blink.csdn.net/details/1186592
https://ask.csdn.net/questions/7416244
https://ask.csdn.net/questions/7416244
https://blog.csdn.net/csdnsevenn/article/details/115499523
https://blog.csdn.net/csdnsevenn/article/details/115499523
https://ask.csdn.net/rank?type=10&issueId=2&utm_source=349478199
https://ask.csdn.net/rank?type=10&issueId=2&utm_source=349478199
https://blog.csdn.net/m0_46163918/article/details/115404101
https://blog.csdn.net/m0_46163918/article/details/115404101
https://edu.csdn.net/huiyiCourse
https://edu.csdn.net/huiyiCourse
https://blink.csdn.net/details/1192704
https://blink.csdn.net/details/1192704
https://blink.csdn.net/details/1161262
https://blink.csdn.net/details/1161262
https://blink.csdn.net/details/1190390
https://blink.csdn.net/details/1190390
https://blog.csdn.net/weixin_39786569/article/details/115343105
https://blog.csdn.net/weixin_39786569/article/details/115343105
https://blog.csdn.net/qq_35067322/article/details/115344186
https://blog.csdn.net/qq_35067322/article/details/115344186
https://blog.csdn.net/zhengwangzw/article/details/115313025
https://blog.csdn.net/zhengwangzw/article/details/115313025
https://blog.csdn.net/weixin_39786569/article/details/115317157
https://blog.csdn.net/weixin_39786569/article/details/115317157
https://csdnnews.blog.csdn.net/article/details/115683312
https://so.csdn.net/so/search/s.do?q=Logica
https://so.csdn.net/so/search/s.do?q=SQL
https://so.csdn.net/so/search/s.do?q=编程语言
https://blog.csdn.net/csdnnews
https://csdnnews.blog.csdn.net/article/details/115689764
https://csdnnews.blog.csdn.net/article/details/115689764
https://blog.csdn.net/csdnnews
https://blog.csdn.net/HyperAI/article/details/115675360
https://blog.csdn.net/HyperAI/article/details/115675360
https://blog.csdn.net/HyperAI
https://blog.csdn.net/hollis_chuang/article/details/115680754
https://blog.csdn.net/hollis_chuang/article/details/115680754
https://blog.csdn.net/hollis_chuang
https://blog.csdn.net/programmer_editor/article/details/115616666
https://blog.csdn.net/programmer_editor/article/details/115616666
https://blog.csdn.net/programmer_editor
https://csdnnews.blog.csdn.net/article/details/115658607
https://csdnnews.blog.csdn.net/article/details/115658607
https://blog.csdn.net/csdnnews
https://csdnnews.blog.csdn.net/article/details/115649568
https://csdnnews.blog.csdn.net/article/details/115649568
https://blog.csdn.net/csdnnews
https://jiangdg.blog.csdn.net/article/details/115637722
https://jiangdg.blog.csdn.net/article/details/115637722
https://blog.csdn.net/AndrExpert
https://csdnnews.blog.csdn.net/article/details/115499986
https://csdnnews.blog.csdn.net/article/details/115499986
https://blog.csdn.net/csdnnews
https://blog.csdn.net/juwikuang/article/details/115559872
https://blog.csdn.net/juwikuang/article/details/115559872
https://blog.csdn.net/juwikuang
https://csdnnews.blog.csdn.net/article/details/115562285
https://csdnnews.blog.csdn.net/article/details/115562285
https://blog.csdn.net/csdnnews
https://blog.csdn.net/BEYONDMA/article/details/115583477
https://blog.csdn.net/BEYONDMA/article/details/115583477
https://blog.csdn.net/BEYONDMA
https://csdnnews.blog.csdn.net/article/details/115562289
https://csdnnews.blog.csdn.net/article/details/115562289
https://blog.csdn.net/csdnnews
None
https://live.csdn.net/room/qq_19734597/n7RkKM3s
https://live.csdn.net/room/baishuiniyaonulia/WGmcPLSd
http://live.csdn.net/v/158141
https://live.csdn.net/room/zxff716/630jpHOx
https://marketing.csdn.net/questions/Q2104081026288435328
https://live.csdn.net/room/epubit17/yZrbp5Z7
https://live.csdn.net/room/csdnnews/8u8EVW5z
http://live.csdn.net/v/152873
http://live.csdn.net/v/154143
http://live.csdn.net/v/158200

 

posted @ 2021-04-14 14:30  挖掘机斯基  阅读(854)  评论(0编辑  收藏  举报