爬取网易云歌词

爬取网易云音乐某个歌手的全部歌曲的歌词

网易云音乐的网页大多都是js加载出来,我们无法直接通过解析网页来获取歌词,本文讲解如何通过网易提供的API和相应的爬虫技术下载某个歌手全部歌曲的歌词。
网易云音乐的歌词是通过js加载的,无法通过页面直接爬取歌词,好在网易提供了一个歌词的接口地址:http://music.163.com/api/song/lyric?id=song_id&lv=1&kv=1&tv=-1.
因此获取歌词的关键在于找到歌曲的id。
网易云音乐在点击首页的歌手找到目标歌手之后,点击进去在首页只会显示热门的50首歌曲,如何获取该歌手的全部歌曲的歌词呢?实现这个目的可以分为三步:

  1. 获取歌手的id
  2. 获取该歌手的所有歌曲id
  3. 获取所有歌曲的歌词

获取歌手id

网易云音乐对每个歌手都进行了编号,我们可以通过在首页点击歌手,找到相应的歌手后点击进去,在网址链接中可以找到该歌手的id。
找到所有歌手的入口
陈奕迅id
网易云音乐中陈奕迅的id是2116

获取所有歌曲id

在歌手的主页会有该歌手的热门歌曲,但是只有50首,我们在浏览器地址栏看到的网址是:https://music.163.com/#/artist?id=2116 但是这并不是真正可以爬取到歌曲名称的url,去掉url中的‘#’后才是可以直接爬取的网页。https://music.163.com/artist?id=2116 通过网页查看源代码,可以发现在下图的位置,我们可以找到歌曲的名称个对应的id
在这里插入图片描述
接下来就是通过BeautifuSoup4或者lxml来获取网页上的内容了。歌曲的名称和id直接在a标签中。但是页面上的a标签非常多,如何获取该位置的a标签呢,我们可以上层的div获取,寻找div的class属性是‘f-hide’的div。获取该div下的子孙a。在获取a标签的href和文本内容就可以得到song_id和song_name了。本文使用的lxml模块,通过xpath定位标签的方法。也可以通过BeautifulSoup4提供的方法,定位标签并获取文本内容。
这种方式只能获取top50.如何获取全部歌曲的id呢?我们在歌手的界面可以看到有歌手的全部专辑,点击每张专辑后能显示专辑的全部歌曲,我们如果能获取所有专辑里的歌曲,不就能得到该歌手的全部歌曲id了吗,所有这个部分分为两步:

  1. 获取全部专辑id
  2. 获取每张专辑下的全部歌曲id

获取所有专辑id

专辑对应的接口地址是:“https://music.163.com/artist/album?id="+str(singer_id)+"&limit=150&offset=0” 这里有三个单数,分别是歌手id、limit是每页显示的专辑数量、offset是偏移量。后面这两个参数,学过数据库的同学应该都会明白了,如果你实在不理解,就把limit设置的大一点,那所有专辑就会在一个网页上显示。
将陈奕迅对应的专辑limit设为150就能包含所有的专辑。
查看这个网页的源代码:https://music.163.com/artist/album?id=2116&limit=150&offset=0 我们可以找到所有专辑和专辑对应的id:
在这里插入图片描述
通过xpath或者BeautifulSoup4可以找到对应的专辑名和专辑id。同样,这些都在网页的a标签中。

 album_url = "https://music.163.com/artist/album?id="+str(singer_id)+"&limit=150&offset=0"
        html_album = self.get_url_html(album_url)
        album_ids, album_names = self.get_album(html_album)
  • 1
  • 2
  • 3
    def get_url_html(self, url):
        with requests.Session() as session:
            response = session.get(url, headers=headers)
            text = response.text
            html = etree.HTML(text)
        return html
    def get_album(self, html):
        album_ids = html.xpath("//ul[@id='m-song-module']/li/p/a/@href")
        album_names = html.xpath("//ul[@id='m-song-module']/li/p/a/text()")
        album_ids = [ids.split('=')[-1] for ids in album_ids]
        return album_ids, album_names
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

通过这一步就可以获取所有专辑的id和专辑名。

获取专辑里的所有歌曲id

上一步我们获取了所有专辑的id。那如何通过专辑id获取专辑里的所有歌曲呢?接口地址是:
https://music.163.com/album?id=”+str(album_id)
以陈奕迅《不想放手》专辑为例,该专辑id是2339617.查看网页源代码,如下图:
在这里插入图片描述
通过爬取对应的标签后获取歌曲的id和歌曲名。

    def get_all_song_id(self, album_ids):
    <span class="token keyword">with</span> requests<span class="token punctuation">.</span>Session<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">as</span> session<span class="token punctuation">:</span>
        all_song_ids<span class="token punctuation">,</span> all_song_names <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
        <span class="token keyword">for</span> album_id <span class="token keyword">in</span> album_ids<span class="token punctuation">:</span>
            one_album_url <span class="token operator">=</span> <span class="token string">"https://music.163.com/album?id="</span><span class="token operator">+</span><span class="token builtin">str</span><span class="token punctuation">(</span>album_id<span class="token punctuation">)</span>
            response <span class="token operator">=</span> session<span class="token punctuation">.</span>get<span class="token punctuation">(</span>one_album_url<span class="token punctuation">,</span> headers<span class="token operator">=</span>headers<span class="token punctuation">)</span>
            text <span class="token operator">=</span> response<span class="token punctuation">.</span>text
            html <span class="token operator">=</span> etree<span class="token punctuation">.</span>HTML<span class="token punctuation">(</span>text<span class="token punctuation">)</span>
            album_song_ids <span class="token operator">=</span> html<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">"//ul[@class='f-hide']/li/a/@href"</span><span class="token punctuation">)</span>
            album_song_names <span class="token operator">=</span> html<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">"//ul[@class='f-hide']/li/a/text()"</span><span class="token punctuation">)</span>
            album_song_ids <span class="token operator">=</span> <span class="token punctuation">[</span>ids<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'='</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token keyword">for</span> ids <span class="token keyword">in</span> album_song_ids<span class="token punctuation">]</span>

            all_song_ids<span class="token punctuation">.</span>append<span class="token punctuation">(</span>album_song_ids<span class="token punctuation">)</span>
            all_song_names<span class="token punctuation">.</span>append<span class="token punctuation">(</span>album_song_names<span class="token punctuation">)</span>

    <span class="token keyword">return</span> all_song_ids<span class="token punctuation">,</span> all_song_names
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

通过这一步,我们得到了该歌手的全部歌曲名称和id

获取所有歌曲歌词

网易云音乐的歌词在网页上是不能直接爬取的,好在我们可以通过接口来得到歌曲的歌词:
http://music.163.com/api/song/lyric?id=song_id&lv=1&kv=1&tv=-1
该url返回的是json数据格式的结果,因此我们可以通过python的json模块解析结果,本文使用的simplejson具有同样的效果,simplejson更加灵活轻量一些。
得到json结果:

   def get_url_json(self, url):
    <span class="token keyword">with</span> requests<span class="token punctuation">.</span>Session<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">as</span> session<span class="token punctuation">:</span>
        response <span class="token operator">=</span> session<span class="token punctuation">.</span>get<span class="token punctuation">(</span>url<span class="token punctuation">,</span> headers<span class="token operator">=</span>headers<span class="token punctuation">)</span>
        text <span class="token operator">=</span> response<span class="token punctuation">.</span>text
        text_json <span class="token operator">=</span> simplejson<span class="token punctuation">.</span>loads<span class="token punctuation">(</span>text<span class="token punctuation">)</span>
    <span class="token keyword">return</span> text_json
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

解析json歌词:

    def parse_lyric(self, text_json):
        try:
            lyric = text_json.get('lrc').get('lyric')
            regex = re.compile(r'\[.*\]')
            final_lyric = re.sub(regex, '', lyric).strip()
            return final_lyric
        except AttributeError as k:
            print(k)
            pass
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

获取歌手全部歌词的关键主要流程代码,本次将爬取的结果以文件的形式保存:

    def get_all_song_lyric(self,singer_id):
        album_url = "https://music.163.com/artist/album?id="+str(singer_id)+"&limit=150&offset=0"
        html_album = self.get_url_html(album_url)
        album_ids, album_names = self.get_album(html_album)
        all_song_ids, all_song_names = self.get_all_song_id(album_ids)
        all_song_ids = reduce(operator.add, all_song_ids)
        all_song_names = reduce(operator.add, all_song_names)
        print(all_song_ids)
        print(all_song_names)
        for song_id, song_name in zip(all_song_ids, all_song_names):
            url_song = 'http://music.163.com/api/song/lyric?' + 'id=' + str(song_id) + '&lv=1&kv=1&tv=-1'
            json_text = self.get_url_json(url_song)
            print(song_name)
            try:
                with open('D:/lyric/陈奕迅/'+str(song_name)+".txt", 'w+') as f:
                    f.write(self.parse_lyric(json_text))
                    # print(song_name)
                    # print(self.parse_lyric(json_text))
                    # print('-' * 30)
            except Exception as e:
                pass
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

Tips:

  • 有些歌曲是轻音乐没有歌词,在解析歌词的时候会出错,因此在解析的使用要使用try的方式解析。
    def parse_lyric(self, text_json):
        try:
            lyric = text_json.get('lrc').get('lyric')
            regex = re.compile(r'\[.*\]')
            final_lyric = re.sub(regex, '', lyric).strip()
            return final_lyric
        except AttributeError as k:
            print(k)
            pass
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 本文使用的lxml模块利用xpath路径获取标签内容和属性,一定要定位准确。

全部代码如下:

import requests
from lxml import etree
import simplejson
import re
import operator
from functools import reduce
ua = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
headers = {
    'User-agent': ua
}
class CrawlerLyric:
    def __init__(self):
        self.author_name = ""
<span class="token keyword">def</span> <span class="token function">get_url_html</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> url<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token keyword">with</span> requests<span class="token punctuation">.</span>Session<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">as</span> session<span class="token punctuation">:</span>
        response <span class="token operator">=</span> session<span class="token punctuation">.</span>get<span class="token punctuation">(</span>url<span class="token punctuation">,</span> headers<span class="token operator">=</span>headers<span class="token punctuation">)</span>
        text <span class="token operator">=</span> response<span class="token punctuation">.</span>text
        html <span class="token operator">=</span> etree<span class="token punctuation">.</span>HTML<span class="token punctuation">(</span>text<span class="token punctuation">)</span>
    <span class="token keyword">return</span> html

<span class="token keyword">def</span> <span class="token function">get_url_json</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> url<span class="token punctuation">)</span><span class="token punctuation">:</span>

    <span class="token keyword">with</span> requests<span class="token punctuation">.</span>Session<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">as</span> session<span class="token punctuation">:</span>
        response <span class="token operator">=</span> session<span class="token punctuation">.</span>get<span class="token punctuation">(</span>url<span class="token punctuation">,</span> headers<span class="token operator">=</span>headers<span class="token punctuation">)</span>
        text <span class="token operator">=</span> response<span class="token punctuation">.</span>text
        text_json <span class="token operator">=</span> simplejson<span class="token punctuation">.</span>loads<span class="token punctuation">(</span>text<span class="token punctuation">)</span>
    <span class="token keyword">return</span> text_json

<span class="token keyword">def</span> <span class="token function">parse_song_id</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> html<span class="token punctuation">)</span><span class="token punctuation">:</span>

    song_ids <span class="token operator">=</span> html<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">"//ul[@class='f-hide']//a/@href"</span><span class="token punctuation">)</span>
    song_names <span class="token operator">=</span> html<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">"//ul[@class='f-hide']//a/text()"</span><span class="token punctuation">)</span>
    self<span class="token punctuation">.</span>author_name <span class="token operator">=</span> html<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'//title/text()'</span><span class="token punctuation">)</span>
    song_ids <span class="token operator">=</span> <span class="token punctuation">[</span>ids<span class="token punctuation">[</span><span class="token number">9</span><span class="token punctuation">:</span><span class="token builtin">len</span><span class="token punctuation">(</span>ids<span class="token punctuation">)</span><span class="token punctuation">]</span> <span class="token keyword">for</span> ids <span class="token keyword">in</span> song_ids<span class="token punctuation">]</span>
    <span class="token keyword">return</span> self<span class="token punctuation">.</span>author_name<span class="token punctuation">,</span> song_ids<span class="token punctuation">,</span> song_names

<span class="token keyword">def</span> <span class="token function">parse_lyric</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> text_json<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token keyword">try</span><span class="token punctuation">:</span>
        lyric <span class="token operator">=</span> text_json<span class="token punctuation">.</span>get<span class="token punctuation">(</span><span class="token string">'lrc'</span><span class="token punctuation">)</span><span class="token punctuation">.</span>get<span class="token punctuation">(</span><span class="token string">'lyric'</span><span class="token punctuation">)</span>
        regex <span class="token operator">=</span> re<span class="token punctuation">.</span><span class="token builtin">compile</span><span class="token punctuation">(</span>r<span class="token string">'\[.*\]'</span><span class="token punctuation">)</span>
        final_lyric <span class="token operator">=</span> re<span class="token punctuation">.</span>sub<span class="token punctuation">(</span>regex<span class="token punctuation">,</span> <span class="token string">''</span><span class="token punctuation">,</span> lyric<span class="token punctuation">)</span><span class="token punctuation">.</span>strip<span class="token punctuation">(</span><span class="token punctuation">)</span>
        <span class="token keyword">return</span> final_lyric
    <span class="token keyword">except</span> AttributeError <span class="token keyword">as</span> k<span class="token punctuation">:</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span>k<span class="token punctuation">)</span>
        <span class="token keyword">pass</span>

<span class="token keyword">def</span> <span class="token function">get_album</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> html<span class="token punctuation">)</span><span class="token punctuation">:</span>
    album_ids <span class="token operator">=</span> html<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">"//ul[@id='m-song-module']/li/p/a/@href"</span><span class="token punctuation">)</span>
    album_names <span class="token operator">=</span> html<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">"//ul[@id='m-song-module']/li/p/a/text()"</span><span class="token punctuation">)</span>
    album_ids <span class="token operator">=</span> <span class="token punctuation">[</span>ids<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'='</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token keyword">for</span> ids <span class="token keyword">in</span> album_ids<span class="token punctuation">]</span>
    <span class="token keyword">return</span> album_ids<span class="token punctuation">,</span> album_names

<span class="token keyword">def</span> <span class="token function">get_top50</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> sing_id<span class="token punctuation">)</span><span class="token punctuation">:</span>
    url_singer <span class="token operator">=</span> <span class="token string">'https://music.163.com/artist?id='</span><span class="token operator">+</span><span class="token builtin">str</span><span class="token punctuation">(</span>sing_id<span class="token punctuation">)</span>  <span class="token comment"># 陈奕迅</span>
    html_50 <span class="token operator">=</span> self<span class="token punctuation">.</span>get_url_html<span class="token punctuation">(</span>url_singer<span class="token punctuation">)</span>
    author_name<span class="token punctuation">,</span> song_ids<span class="token punctuation">,</span> song_names <span class="token operator">=</span> self<span class="token punctuation">.</span>parse_song_id<span class="token punctuation">(</span>html_50<span class="token punctuation">)</span>
    <span class="token comment"># print(author_name, song_ids, song_names)</span>
    <span class="token keyword">for</span> song_id<span class="token punctuation">,</span> song_name <span class="token keyword">in</span> <span class="token builtin">zip</span><span class="token punctuation">(</span>song_ids<span class="token punctuation">,</span> song_names<span class="token punctuation">)</span><span class="token punctuation">:</span>
        url_song <span class="token operator">=</span> <span class="token string">'http://music.163.com/api/song/lyric?'</span> <span class="token operator">+</span> <span class="token string">'id='</span> <span class="token operator">+</span> <span class="token builtin">str</span><span class="token punctuation">(</span>song_id<span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token string">'&amp;lv=1&amp;kv=1&amp;tv=-1'</span>
        json_text <span class="token operator">=</span> self<span class="token punctuation">.</span>get_url_json<span class="token punctuation">(</span>url_song<span class="token punctuation">)</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span>song_name<span class="token punctuation">)</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>parse_lyric<span class="token punctuation">(</span>json_text<span class="token punctuation">)</span><span class="token punctuation">)</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">'-'</span> <span class="token operator">*</span> <span class="token number">30</span><span class="token punctuation">)</span>

<span class="token keyword">def</span> <span class="token function">get_all_song_id</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> album_ids<span class="token punctuation">)</span><span class="token punctuation">:</span>

    <span class="token keyword">with</span> requests<span class="token punctuation">.</span>Session<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">as</span> session<span class="token punctuation">:</span>
        all_song_ids<span class="token punctuation">,</span> all_song_names <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
        <span class="token keyword">for</span> album_id <span class="token keyword">in</span> album_ids<span class="token punctuation">:</span>
            one_album_url <span class="token operator">=</span> <span class="token string">"https://music.163.com/album?id="</span><span class="token operator">+</span><span class="token builtin">str</span><span class="token punctuation">(</span>album_id<span class="token punctuation">)</span>
            response <span class="token operator">=</span> session<span class="token punctuation">.</span>get<span class="token punctuation">(</span>one_album_url<span class="token punctuation">,</span> headers<span class="token operator">=</span>headers<span class="token punctuation">)</span>
            text <span class="token operator">=</span> response<span class="token punctuation">.</span>text
            html <span class="token operator">=</span> etree<span class="token punctuation">.</span>HTML<span class="token punctuation">(</span>text<span class="token punctuation">)</span>
            album_song_ids <span class="token operator">=</span> html<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">"//ul[@class='f-hide']/li/a/@href"</span><span class="token punctuation">)</span>
            album_song_names <span class="token operator">=</span> html<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">"//ul[@class='f-hide']/li/a/text()"</span><span class="token punctuation">)</span>
            album_song_ids <span class="token operator">=</span> <span class="token punctuation">[</span>ids<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'='</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token keyword">for</span> ids <span class="token keyword">in</span> album_song_ids<span class="token punctuation">]</span>

            all_song_ids<span class="token punctuation">.</span>append<span class="token punctuation">(</span>album_song_ids<span class="token punctuation">)</span>
            all_song_names<span class="token punctuation">.</span>append<span class="token punctuation">(</span>album_song_names<span class="token punctuation">)</span>

    <span class="token keyword">return</span> all_song_ids<span class="token punctuation">,</span> all_song_names

<span class="token keyword">def</span> <span class="token function">get_all_song_lyric</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span>singer_id<span class="token punctuation">)</span><span class="token punctuation">:</span>
    album_url <span class="token operator">=</span> <span class="token string">"https://music.163.com/artist/album?id="</span><span class="token operator">+</span><span class="token builtin">str</span><span class="token punctuation">(</span>singer_id<span class="token punctuation">)</span><span class="token operator">+</span><span class="token string">"&amp;limit=150&amp;offset=0"</span>
    html_album <span class="token operator">=</span> self<span class="token punctuation">.</span>get_url_html<span class="token punctuation">(</span>album_url<span class="token punctuation">)</span>
    album_ids<span class="token punctuation">,</span> album_names <span class="token operator">=</span> self<span class="token punctuation">.</span>get_album<span class="token punctuation">(</span>html_album<span class="token punctuation">)</span>
    all_song_ids<span class="token punctuation">,</span> all_song_names <span class="token operator">=</span> self<span class="token punctuation">.</span>get_all_song_id<span class="token punctuation">(</span>album_ids<span class="token punctuation">)</span>
    all_song_ids <span class="token operator">=</span> <span class="token builtin">reduce</span><span class="token punctuation">(</span>operator<span class="token punctuation">.</span>add<span class="token punctuation">,</span> all_song_ids<span class="token punctuation">)</span>
    all_song_names <span class="token operator">=</span> <span class="token builtin">reduce</span><span class="token punctuation">(</span>operator<span class="token punctuation">.</span>add<span class="token punctuation">,</span> all_song_names<span class="token punctuation">)</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span>all_song_ids<span class="token punctuation">)</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span>all_song_names<span class="token punctuation">)</span>
    <span class="token keyword">for</span> song_id<span class="token punctuation">,</span> song_name <span class="token keyword">in</span> <span class="token builtin">zip</span><span class="token punctuation">(</span>all_song_ids<span class="token punctuation">,</span> all_song_names<span class="token punctuation">)</span><span class="token punctuation">:</span>
        url_song <span class="token operator">=</span> <span class="token string">'http://music.163.com/api/song/lyric?'</span> <span class="token operator">+</span> <span class="token string">'id='</span> <span class="token operator">+</span> <span class="token builtin">str</span><span class="token punctuation">(</span>song_id<span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token string">'&amp;lv=1&amp;kv=1&amp;tv=-1'</span>
        json_text <span class="token operator">=</span> self<span class="token punctuation">.</span>get_url_json<span class="token punctuation">(</span>url_song<span class="token punctuation">)</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span>song_name<span class="token punctuation">)</span>
        <span class="token keyword">try</span><span class="token punctuation">:</span>
            <span class="token keyword">with</span> <span class="token builtin">open</span><span class="token punctuation">(</span><span class="token string">'D:/lyric/陈奕迅/'</span><span class="token operator">+</span><span class="token builtin">str</span><span class="token punctuation">(</span>song_name<span class="token punctuation">)</span><span class="token operator">+</span><span class="token string">".txt"</span><span class="token punctuation">,</span> <span class="token string">'w+'</span><span class="token punctuation">)</span> <span class="token keyword">as</span> f<span class="token punctuation">:</span>
                f<span class="token punctuation">.</span>write<span class="token punctuation">(</span>self<span class="token punctuation">.</span>parse_lyric<span class="token punctuation">(</span>json_text<span class="token punctuation">)</span><span class="token punctuation">)</span>
                <span class="token comment"># print(song_name)</span>
                <span class="token comment"># print(self.parse_lyric(json_text))</span>
                <span class="token comment"># print('-' * 30)</span>
        <span class="token keyword">except</span> Exception <span class="token keyword">as</span> e<span class="token punctuation">:</span>
            <span class="token keyword">pass</span>

if name == "main":
sing_id = '2116' # 陈奕迅
sing_id_chenli = '1007170' # 陈粒
c = CrawlerLyric()
c.get_all_song_lyric(sing_id_chenli)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
                                </div>
            <link href="https://csdnimg.cn/release/phoenix/mdeditor/markdown_views-b6c3c6d139.css" rel="stylesheet">
                                            <div class="more-toolbox">
            <div class="left-toolbox">
                <ul class="toolbox-list">
                    
                    <li class="tool-item tool-active is-like "><a href="javascript:;"><svg class="icon" aria-hidden="true">
                        <use xlink:href="#csdnc-thumbsup"></use>
                    </svg><span class="name">点赞</span>
                    <span class="count">2</span>
                    </a></li>
                    <li class="tool-item tool-active is-collection "><a href="javascript:;" data-report-click="{&quot;mod&quot;:&quot;popu_824&quot;}"><svg class="icon" aria-hidden="true">
                        <use xlink:href="#icon-csdnc-Collection-G"></use>
                    </svg><span class="name">收藏</span></a></li>
                    <li class="tool-item tool-active is-share"><a href="javascript:;" data-report-click="{&quot;mod&quot;:&quot;1582594662_002&quot;}"><svg class="icon" aria-hidden="true">
                        <use xlink:href="#icon-csdnc-fenxiang"></use>
                    </svg>分享</a></li>
                    <!--打赏开始-->
                                            <!--打赏结束-->
                                            <li class="tool-item tool-more">
                        <a>
                        <svg t="1575545411852" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="5717" xmlns:xlink="http://www.w3.org/1999/xlink" width="200" height="200"><defs><style type="text/css"></style></defs><path d="M179.176 499.222m-113.245 0a113.245 113.245 0 1 0 226.49 0 113.245 113.245 0 1 0-226.49 0Z" p-id="5718"></path><path d="M509.684 499.222m-113.245 0a113.245 113.245 0 1 0 226.49 0 113.245 113.245 0 1 0-226.49 0Z" p-id="5719"></path><path d="M846.175 499.222m-113.245 0a113.245 113.245 0 1 0 226.49 0 113.245 113.245 0 1 0-226.49 0Z" p-id="5720"></path></svg>
                        </a>
                        <ul class="more-box">
                            <li class="item"><a class="article-report">文章举报</a></li>
                        </ul>
                    </li>
                                        </ul>
            </div>
                        </div>
        <div class="person-messagebox">
            <div class="left-message"><a href="https://blog.csdn.net/czbape">
                <img src="https://profile.csdnimg.cn/4/3/C/3_czbape" class="avatar_pic" username="czbape">
                                        <img src="https://g.csdnimg.cn/static/user-reg-year/2x/5.png" class="user-years">
                                </a></div>
            <div class="middle-message">
                                    <div class="title"><span class="tit"><a href="https://blog.csdn.net/czbape" data-report-click="{&quot;mod&quot;:&quot;popu_379&quot;}" target="_blank">Smilecz0</a></span>
                                        </div>
                <div class="text"><span>发布了3 篇原创文章</span> · <span>获赞 2</span> · <span>访问量 1398</span></div>
            </div>
                            <div class="right-message">
                                        <a href="https://im.csdn.net/im/main.html?userName=czbape" target="_blank" class="btn btn-sm btn-red-hollow bt-button personal-letter">私信
                    </a>
                                                        <a class="btn btn-sm  bt-button personal-watch" data-report-click="{&quot;mod&quot;:&quot;popu_379&quot;}">关注</a>
                                </div>
                        </div>
                </div>
</article>
posted @ 2020-03-17 23:07  身为风帆终有岸  阅读(1399)  评论(0)    收藏  举报