详细介绍:生成网站sitemap.xml地图教程
要生成 sitemap.xml
文件,需要通过爬虫程序抓取网站的所有有效链接。以下是完整的解决方案:
步骤 1:安装必要的 Python 库
- ounter(line
pip install requests beautifulsoup4 lxml
步骤 2:创建 Python 爬虫脚本 (sitemap_generator.py
)
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
import requests
步骤 3:运行脚本
- ounter(line
python sitemap_generator.py
执行说明:
爬虫逻辑:
从首页
https://www.91kaiye.cn/
开始广度优先搜索自动过滤非本站链接、锚点和无效 URL
记录每个页面的最后修改日期(默认当天)
设置更新频率为
daily
,优先级为0.8
输出文件:
- 生成的
sitemap.xml
格式如下:- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- ounter(line
- 生成的
注意事项:
反爬措施:
- 如果网站有反爬机制,可能需要:
添加
time.sleep(1)
延迟请求使用代理 IP
设置更真实的请求头
- 如果网站有反爬机制,可能需要:
动态内容:
对于 JavaScript 渲染的页面(如 Vue/React),需改用
Selenium
或Playwright
优化建议:
在服务器上定期运行(如每周一次)
提交到 Google Search Console
- 在
robots.txt
中添加:- ounter(line
Sitemap: https://www.91kaiye.cn/sitemap.xml
替代方案:使用在线工具
如果不想运行代码,可用在线服务生成:
生成后请将 sitemap.xml
上传到网站根目录,并通过百度/Google站长工具提交。