各种模块讲解day15

今日学习总结：

模块

一、time 模块

在python的三种时间表现形式：

1.时间戳：给电脑看的

自1970-01-01 00:00:00到当前时间，按秒计算，计算了多少秒

2.格式化时间：（format string）:给人看的

返回的是时间的字符串 2002-01-11

01将获取当前的时间对象进行格式化时间输出

import time
print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime())) #将获取当前的时间对象进行格式化时间输出
结果：2019-11-16 14:33:00

02把字符串格式的时间转为时间对象 .strptime

import time
res = time.strptime('2019-01-01', '%Y-%m-%d')    #把字符串格式的时间转为时间对象
print(res)
结果：
time.struct_time(tm_year=2019, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=1, tm_isdst=-1)

3.格式化时间对象（struct_time）

返回一个元组，元组中有9个人值：分别代表：年、月、日、时、分、秒、一周中的第几天、一年中的第几天、夏令时

1. .time（）

import time
print(time.time()) #获取时间戳

结果：
1573884986.1575663

2. .strftime（）

注意：

%X 等于 %H:%M:%S

import time
# 获取年月日时分秒
print(time.strftime('%Y-%m-%d %H:%M:%S'))
# %X == %H:%M:%S
print(time.strftime('%Y-%m-%d %X')

结果：
2019-11-16 14:20:24
2019-11-16 14:20:24

3. .localtime()

import time
#获取时间对象 （****）
print(time.localtime())            #获取当地的时间对象  是一个元组，包含了9个值
print(type(time.localtime()))      #输出的结果是一个时间对象
time_obj = time.localtime()        #将时间对象赋给一个变量
print(time_obj.tm_year)            #取时间对象中的年的值
print(time_obj.tm_mon)             #取时间对象中的月的值


结果：
time.struct_time(tm_year=2019, tm_mon=11, tm_mday=16, tm_hour=14, tm_min=26, tm_sec=41, tm_wday=5, tm_yday=320, tm_isdst=0)
<class 'time.struct_time'>
2019
11

 import time
 res = time.localtime()                                               #表示在睡之前获取时间对象
 time.sleep(5)

 获取当前时间的格式化时间
 print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime()))         #这里的time.localtime()  表示在睡之后获取当前时间对象

将时间对象转为格式化时间 
print(time.strftime('%Y-%m-%d %H:%M:%S', res))

结果：

2019-11-16 14:53:14
2019-11-16 14:53:09

二、datetime 模块

常用的方法：

1、.date.today() #获取当前年月日

2、.datetime.today() # 获取当前年月日时分秒

3、.datetime.now() #获取当前北京时间

4、.datetime.utnow() #获取当前北京时间

5、.timedelta(days=某个数字) #表示获取时间对象

例子：1和2

import datetime

print(datetime.date.today())      #获取当前年月日

print(datetime.datetime.today())  # 获取当前年月日时分秒

time=datetime.date.today()             #表示是一个时间对象

time_obj = datetime.datetime.today()   #表示是一个时间对象

print(type(time_obj))
print(type(time_obj))
print(time_obj.year)             
print(time_obj.month)
print(time_obj.day)


结果：
2019-11-16
2019-11-16 15:15:37.908715
<class 'datetime.datetime'>
<class 'datetime.datetime'>
2019
11
16

例子：3和4

import datetime
 #UTC时区

print(datetime.datetime.now())     #获取当前北京时间

print(datetime.datetime.utcnow())  #获取当前格林威治时间


结果:
2019-11-16 15:21:29.314814
2019-11-16 07:21:29.314814

例子：5

# 时间对象
time_obj = datetime.timedelta(days=7) #表示获取7天时间
print(time_obj)

#日期/时间的计算 (*******)
#日期时间 = 日期时间 “+” or “-” 时间对象
#时间对象 = 日期时间 “+” or “-” 日期时间

# 日期时间:
import datetime
current_time = datetime.datetime.now() #表示当前时间
print(current_time)

# 时间对象
time_obj = datetime.timedelta(days=7) #表示获取7天时间
print(time_obj)

later_time = current_time + time_obj      # 获取当前时间7天后的时间：日期时间 = 日期时间 “+” or “-” 时间对象
print(later_time)

time_new_obj = later_time - current_time  # 时间对象 = 日期时间 “+” or “-” 日期时间

print(time_new_obj)
结果：

2019-11-16 15:38:37.719635
7 days, 0:00:00
2019-11-23 15:38:37.719635
7 days, 0:00:00

三、random模块

常用的内置方法：

1、.randint(a.b) 表示生成一个[a,b]之间的随机整数

2、.uniform(a,b) 表示生成一个[a,b]之间的随机小数

3、.random() 表示默认获取0——1之间任意小数 ()中不写值

4、.choice(可迭代对象--这里的可迭代对象要有索引的 ) 表示从中随机选择一个元素

5、.shuffle(可迭代对象--注意不可变数据类型不行) 表示将所有的元素进行随机排序，返回打乱后的序列

6、.sample(list,k) 表示在list中随机产生一个长度为k的新列表

7、.seed() 表示初始化给的随机种子，默认为当前系统时间

8. chr(97) 表示可以将ASCII表中值转换成对应的字符

案例：需求:

大小写字母、数字组合而成
组合5位数的随机验证码

import random
code=''                                        #先定义一个空的字符串
for i in range(5):
    res1=random.randint(97,122)
    lower_str=chr(res1)
    res2=random.randint(65,90)
    upper_str=chr(res2)
    number=random.randint(0,9)
    number_str=str(number)                      #将随机取出的整数转换成字符串
    code_list=[lower_str,upper_str,number_str]  #表示把每一次随机产生的lower_str upper_str number_str 放到code_list列表里
    random_code=random.choice(code_list)
    code +=random_code                          #for i in range(5)  表示code会加5次 所以才出现5位的验证码
print(code)
结果：
T1s15  随机的  不定的

优化：让验证码可以任意位

import random
def get_code(n):
    code=''
    for i in range(n):
        res1=random.randint(97,122)
        lower_str=chr(res1)
        res2=random.randint(65,90)
        upper_str=chr(res2)
        number=random.randint(0,9)
        number_str=str(number)                      #将随机取出的整数转换成字符串
        code_list=[lower_str,upper_str,number_str]  #表示把每一次随机产生的lower_str upper_str number_str 放到code_list列表里
        random_code=random.choice(code_list)
        code +=random_code                          #for i in range(5)  表示code会加5次 所以才出现5位的验证码
    return code
get_code=get_code(7)
print(get_code)
#在这里不能用get_code 因为返回的code 是字符串

函数地址作返回值

def test3(u):
    print(f"{u}hello")
    return test3                   #test3是个地址，返回给函数test3()  .注意test3() 也是一个地址，因为test3是个地址给了test3()
z=test3(2)                       #函数被调用了一次，并把返回的函数地址给了变量z
z(3)                                 #这里变量z是地址 加上（）又调用函数了

def test3(u):
    print(f"{u}hello")
    return test3                     #test3是个地址，返回给函数test3()  .注意test3() 也是一个地址，因为test3是个地址给了test3()
z=test3(2)                           #函数被调用了一次，并把返回的函数地址给了变量z
z(3)                                 #这里变量z是地址 加上（）又调用函数了

结果：

2hello
3hello

四、os 模块：是与操作系统交互的模块

注意：项目的根目录，路径相关的值都用 “常量

常用方法：

1、.path.dirname(_file_) 表示获取当前文件的上一级目录路径

2、.path.join(绝对路径，要拼接的文件的名字) 表示路径拼接，拼接文件 “绝对路径”

3、.path.exists(文件名/文件夹/绝对路径) 表示查看文件名/文件夹是否存在，返回的是一个布尔值

4、 .path.isdir(文件夹的路径) 表示查看文件夹是否存在

5、 .path.isfile(文件的路径) 表示查看文件是否存在

6、.mkdir(文件名/文件夹) 表示创建文件名/文件夹

7、.rmdir(文件名/文件夹) 表示删除文件名/文件夹（注意：只能删除空的）

8、.listdir(r'文件夹的绝对路径') 表示取某个文件夹中所有文件的名字

08、 enumerate(可迭代对象) ---> 得到一个对象，对象有一个个的元组(索引, 元素)

例子1

ps:DAY15_PATH 表示是当前文件

import os
DAY15_PATH = os.path.dirname(__file__) print(DAY15_PATH)

例子6和06

import os
teacher_list = os.listdir(r'G:\正课\day15')   # .listdir()表示 将某个文件夹下的所有文件取出文件名，返回的是一个列表。
print(teacher_list)

res = enumerate(teacher_list)                #enumerate(可迭代对象) 表示将可迭代对象返回一个里面有一个个元组（这个元组带编号，格式是（0，元素），（1，元素）。。。）组成的元组
print(list(res))

# 让用户选择文件
while True:
    # 1.打印所有老师的作品
    for index, name in enumerate(teacher_list):
        print(f'编号: {index} 文件名: {name}')

    choice = input('请选择想看的老师作品编号：').strip()

    # 2.限制用户必须输入数字，数字的范围必须在编号内
    # 若不是数字，则重新选择
    if not choice.isdigit():
        print('必须输入数字')
        continue

    # 若是数字，往下走判断是否在编号范围内
    choice = int(choice)

    # 判断如果不在列表范围内，则重新选择
    if choice not in range(len(teacher_list)):
        print('编号范围错误!')
        continue

    file_name = teacher_list[choice]                         #teacher_list是一个列表，choice是索引值

    teacher_path = os.path.join(r'G:\正课\day15', file_name) #拼接路径

    print(teacher_path)

    with open(teacher_path, 'r', encoding='utf-8') as f:
        print(f.read())

五、 sys 模块：与python交互

例子要看

方法：

sys.path

sys.path.append(项目路径)

import sys
import os

print(sys.path)                                        # 获取当前的Python解释器的环境变量路径，获得一个列表，里面有一堆路径

BASE_PATH = os.path.dirname(os.path.dirname(__file__)) #将当前项目添加到环境变量中
sys.path.append(BASE_PATH)   将挡墙项目根目录添加到环境变量中。

# 获取操作系统cmd终端的命令行  python3 py文件 用户名 密码
print(sys.argv)                                         # 返回的是列表['']    sys.argv 表示执行py文件的权限认证

六、hashlib :是一个加密模块:

内置了很多算法
1- MD5: 不可解密的算法（2018年以前）

hashlib.md5() 获取对象

hashlib.md5().update(密码.encode('utf-8'))

hashlib.md5().hexdigest()

import hashlib

md5_obj = hashlib.md5()                # MD5: 不可解密的算法   在这里.md5()是一类对象
print(type(md5_obj))
str1 = '1234'
md5_obj.update(str1.encode('utf-8'))   # update中一定要传入bytes类型数据   在这里将字符串str1解码成二进制

res = md5_obj.hexdigest()              # 得到一个加密后的字符串
print(res)   

结果：

<class '_hashlib.HASH'>
81dc9bdb52d04dc20036dbd8313ed055

2-摘要算法:
- 摘要是从某个内容中获取的加密字符串
- 摘要一样，内容就一定一样: 保证唯一性

- 密文密码就是一个摘要

七、序列化模块：也是字符串化，将其他数据类型转成字符串再存到文件中

1. json模块：提供了一种很简单的方式来编码和解码JSON数据。是一个序列化模块，一个第三方的特殊数据格式。可以将python数据类型---json数据格式---字符串---文件中

其他语言想要使用python的数据：文件中---字符串---json数据格式---其他语言的数据类型

ps:

# 在json中，所有的字符串都是双引号

# 元组比较特殊:python中的元组，若将其转换成json数据，内部会将元组 ---> 列表

# set是不能转换成json数据

内置的方法：

01、 json.dumps(list/dict/tuple,ensure_ascii=False) 用于将list/dict/tuple类型的数据转成str，因为如果直接将dict类型的数据写入json文件中会发生报错，因此在将数据写入时需要用到该函数。

02、json.loads() 用于将str类型的数据转成dict。

03、json.dump() 用于将dict类型的数据转成str，并写入到json文件中。

04、json.load() 用于从json文件中读取数据。

例子01和02

import json
list1 = ['张全蛋', '李小花']
json_str = json.dumps(list1, ensure_ascii=False)   # ensure_ascii将默认的ascii  .取消设置为False，可以在控制台看到中文，否则看到的是bytes类型数据
print(json_str)
print(type(json_str))              # str

python_data=json.loads(json_str)
print(python_data)
print(type(python_data))           # list

结果：
["张全蛋", "李小花"]
<class 'str'>
['张全蛋', '李小花']
<class 'list'>

元组比较特殊:python中的元组，若将其转换成json数据，内部会将元组 ---> 列表

tuple1 = ('张全蛋', '李小花')
json_str = json.dumps(tuple1, ensure_ascii=False)
print(json_str)                        #这里将元组转成列表，再变成json数据类型的str 因为元组本身不可变，存数据是为了读改数据
print(type(json_str))                  # str


python_data = json.loads(json_str)
print(tuple(python_data))
print(type(tuple(python_data)))        # list

结果：

["张全蛋", "李小花"] #这里本来应该是（）现在是[]
<class 'str'>
('张全蛋', '李小花')
<class 'tuple'>

PS: 由于各种语言的数据类型不一，但长相可以一样，
比如python不能直接使用其他语言的数据类型，
必须将其他语言的数据类型转换成json数据格式，
python获取到json数据后可以将json转换成python的数据类型。

loads 与load 的区别

1.loads
loads针对内存对象
loads: 将字符串转换为字典

2.load
load针对文件句柄
load: 将数据写入json文件中

import json
dic = {
    'name': 'tank',
    'age': 17
}

json_str = json.dumps(dic, ensure_ascii=False)
print(json_str)
print(type(json_str))     # str

python_data = json.loads(json_str)
print(python_data)
print(type(python_data))  # dict

结果：

{"name": "tank", "age": 17} #这里看起来像字典，但是它是json数据类型，是字符串。
<class 'str'>
{'name': 'tank', 'age': 17}
<class 'dict'>

案例：json.dumps和json.loads存取数据

import json
def register():
    username = input('请输入用户名:').strip()
    password = input('请输入密码:').strip()
    re_password = input('请确认密码:').strip()
    if password == re_password:
        user_dic = {
            'name': username, 'pwd': password
        }
        json_str = json.dumps(user_dic, ensure_ascii=False)    #用json.dumps转化成json数据类型
        # 注意: 保存json数据时，用.json作为文件的后缀名
        with open('user.json', 'w', encoding='utf-8') as f:
            f.write(json_str)
    else:
        print('输入的密码不一致')

    with open('user.json', 'r', encoding='utf-8') as f:
        res=json.loads(f.read())
    print(res)

register()

案例：用json.dump和json.load存取数据

import json
def register():
    username = input('请输入用户名:').strip()
    password = input('请输入密码:').strip()
    re_password = input('请确认密码:').strip()
    if password == re_password:
        user_dic = {
            'name':username, 'pwd': password
        }

        with open('user1.json', 'w', encoding='utf-8') as f:
            json.dump(user_dic, f)                           #将字典里的数据转成json数据

        with open('user1.json', 'r', encoding='utf-8') as f:
            user_dic = json.load(f)                          #将文件里的json数据转成字典
            print(user_dic)
            print(type(user_dic))                            # dict
register()

2、pickle模块:是一个python自带的序列化模块。

方法：

pickle.dump(字典/list/tuple/str,句柄)

pickle.load（句柄)

优点:
- 可以支持python中所有的数据类型
- 可以直接存 "bytes类型" 的数据，pickle存取速度更快

缺点: （致命的缺点）
- 只能支持python去使用，不能跨平台

import pickle

set1 = {
    'tank', 'sean', 'jason', '大脸'
}

# 写 dump
with open('teacher.pickle', 'wb') as f:
    pickle.dump(set1, f)                       #以字节的形式写入文件


# 读 load
with open('teacher.pickle', 'rb') as f:
    python_set = pickle.load(f)               #从文件里读出来字节时，用load转成了python数据类型
    print(python_set)
    print(type(python_set))

八、collections模块:提供一些python八大数据类型 “以外的数据类型” 。

1、collections模块下的具名元组（ namedtuple）:具名元组只是一个名字。

from collections import namedtuple

应用一：扑克牌
from collections import namedtuple
card = namedtuple('扑克牌',  ['color', 'number'])  #获取扑克牌对象
red_A = card('♥', 'A')
print(red_A)
black_K = card('♠', 'K')
print(black_K)

结果：
扑克牌(color='♥', number='A')
扑克牌(color='♠', number='K')

应用二：坐标，传入可迭代对象是有序的
from collections import namedtuple
point = namedtuple('坐标', ['x', 'y'])  # 第二个参数既可以传可迭代对象
point1 = namedtuple('坐标', ('x', 'y'))  # 第二个参数既可以传可迭代对象
point2 = namedtuple('坐标', 'x y')       # 第二个参数既可以传可迭代对象


p = point(1, 3)  # 本质上传了3个，面向对象讲解，#传参的个数，要与namedtuple第二个参数的个数一一对应

print(p.x)        # 1  ?
print(p.y)        # 3  ?
p1=point(4,7) 
p2=point(5,6)
print(p,p1,p2)
print(type(p))

结果：

坐标(x=1, y=3) 坐标(x=4, y=7) 坐标(x=5, y=6)
<class '__main__.坐标'>

2、collections模块下的有序字典（OrderedDict）:python中字典默认是无序，collections中提供了有序的字典. 有序字典是无序字典派生而来，有无序字典的方法

from collections import OrderedDict

方法： OrderedDict(无序的字典)

from collections import OrderedDict
order_dict = OrderedDict({'x': 1, 'y': 2, 'z': 3})
print(order_dict)
print(type(order_dict))
print(order_dict.get('y'))          #表示取键y 所对应的值
print(order_dict['z'])              #表示取键z 所对应的值
print(type(order_dict))
for line in order_dict:             #表示遍历的是字典order_dic的key
    print(line)                     #输出的key 是有序的，只是这个字典的数据太小，看不出来而已
结果：

OrderedDict([('x', 1), ('y', 2), ('z', 3)])
2
3
<class 'collections.OrderedDict'>
x
y
z



dic = dict({'x': 1, 'y': 2, 'z': 3})
print(dic)
print(dic.get('y'))                  #表示取键y 所对应的值
print(dic['z'])                      #表示取键z 所对应的值
print(type(dic))
for line in dic:                     #表示遍历的是字典dic的key
    print(line)                      #输出的key 是无序的，只是这个字典的数据太小，看不出来而已

结果：

{'x': 1, 'y': 2, 'z': 3}
2
3
<class 'dict'>
x
y
z

九、openpyxl模块：第三方模块。可以对Excle表格进行操作的模块

1.openpyxl模块下的 Workbook

方法：Workbook() 是excel文件对象

Workbook().create_sheet('写工作表的名字')

Workbook().save('写excel的名字.xlsx')

from openpyxl import Workbook
wb_obj = Workbook()                      # 获取Excel文件对象
wb1 = wb_obj.create_sheet('序列化1')     #创建一个工作表
wb2 = wb_obj.create_sheet('序列化2')     #创建一个工作表

print(wb1.title)
wb1.title = '大宝贝'
print(wb1.title)

wb_obj.save('序列化。xlsx')   # 生成Excel表格
print('excel表格生成成功')

案例：写100个数据

from openpyxl import Workbook
wb_obj = Workbook()
wb1 = wb_obj.create_sheet('工作表1')
n = 1
for line in range(100):

    wb1['A%s' % n] = line + 1    # wb1['表格位置'] = 对应的值  A%s 表示表中A列  % n 表示第几行  line + 1  表示要填的值
    n += 1
wb_obj.save('100条数据.xlsx')

案例2：写多个字典数据到表里

from openpyxl import Workbook
wb_obj=Workbook()
wb1=wb_obj.create_sheet('表格1')
dict1 = {
    'name': 'tank',
    'age': 17
}
n = 1
for key, value in dict1.items():    #dict1.items() 表示返回一个包含所有（键，值）元组的 列表
    wb1['A%s' % n] = key
    wb1['B%s' % n] = value
    n += 1

wb_obj.save('批量插入数据.xlsx')

2.openpyxl模块下的load_workbook

from openpyxl import load_workbook
wb_obj = load_workbook('序列化.xlsx')  #读取excel表里的数据
print(wb_obj)

wb1 = wb_obj['序列化2']                # wb_obj['表名']
print(wb1['A10'].value)                #取序列化2工作表中 A10的值
wb1['A10'] = 20                        #将序列化2工作表中 A10的值 改成20
print(wb1['A10'].value)
wb_obj.save('序列化.xlsx')             #保存修改后的excel表

十、requests模块：第三方模块

方法：

01、 requests.get('网址') 得到一个响应response

02、requests.get('网址').content 得到一个二进制数据

03、requests.get('网址').iter_content() 得到一个生成器对象

04、 requests.get('网址').status_code 得到一个状态码

import requests 

response = requests.get('https://gss3.bdstatic.com/7Po3dSag_xI4khGkpoWK1HF6hhy/baike/w%3D268%3Bg%3D0/sign=9e8b07a25782b2b7a79f3ec20996acd2/aa64034f78f0f736900cd3a70455b319eac413a1.jpg')

print(response.content)
print(response.iter_content())

with open('小泽泽2.jpg', 'wb') as f:
    for line in response.iter_content():
    f.write(line)

案例：爬取250个电影信息

'''
爬取豆瓣TOP250电影信息

    第1页:
        https://movie.douban.com/top250?start=0&filter=

    ...

    第9页:
        https://movie.douban.com/top250?start=200&filter=

    第10页:
        https://movie.douban.com/top250?start=225&filter=
'''

import requests
import re
# 1.发送请求
def get_page(url):
    response = requests.get(url)
    # response.content  # 获取二进制流数据，比如图片、视频、音频
    # response.text  # 获取响应文本，比如html代码
    return response
# 2.解析数据
# response = get_page('url地址')
# parser_page(response.text)
def parser_page(text):  # response.text
    res_list = re.findall(
        '<div class="item">.*?<a href="(.*?)">.*?<span class="title">(.*?)</span>.*?<span class="rating_num".*?>(.*?)</span>.*?<span>(.*?)人评价',
        text,
        re.S)                                                           #re.S  使用re.S参数以后，正则表达式会将这个文本里的数据/字符串作为一个整体，在整体中进行匹配。
    for movie_tuple in res_list:
        print(movie_tuple)
        yield movie_tuple                                                #yield是生成器，生成movie_tuple
# 3.保存数据
# res_list = parser_page(text)
# save_data(res_list)
def save_data(res_list_iter):
    with open('douban.txt', 'a', encoding='utf-8') as f:
        for movie_tuple in res_list_iter:
            movie_url, movie_name, movie_point, movie_num = movie_tuple  #解压赋值
            str1 = f'''
            电影地址: {movie_url} 
            电影名字: {movie_name} 
            电影评分: {movie_point} 
            评价人数: {movie_num}  '''
            f.write(str1)
# 获取10个链接
n = 0
for line in range(10):
    url = f'https://movie.douban.com/top250?start={n}&filter='
    n += 25
    response = get_page(url)
    res_list_iter = parser_page(response.text)
    # print(res_list_iter)
    save_data(res_list_iter)

异步爬取梨视频

# 异步爬取梨视频
import requests
import re
import os
import uuid

from concurrent.futures import ThreadPoolExecutor
pool = ThreadPoolExecutor(100)
# 1.发送请求，获取响应数据
def get_page(url):
    print(f'发送get请求: {url}')
    response = requests.get(url)
    if response.status_code == 200:
        return response
# 2.解析并提取主页id号
def parse_page(response):
    '''
    https://www.pearvideo.com/video_1630253
    https://www.pearvideo.com/video_1630042
    '''
    # 将所有电影的详情页id号，匹配获取，并放到列表中
    id_list = re.findall('href="video_(.*?)"', response.text, re.S)
    # print(len(id_list))
    id_list = list(set(id_list))
    # print(len(id_list))
    return id_list
# 解析详情页，获取视频链接
def parse_detail(res):                                       # parse_detail函数接收的是一个对象，
    '''
    srcUrl="https://video.pearvideo.com/mp4/adshort/20191206/cont-1630253-14671892_adpkg-ad_hd.mp4"
    srcUrl="(.*?)"
    '''
    res2 = res.result()
    print(res2)
    movie_url = re.findall('srcUrl="(.*?)"', res2.text, re.S)
    print(movie_url)
    if movie_url:
        movie_url = movie_url[0]
        pool.submit(save_movie, movie_url)


# 3.保存数据
def save_movie(movie_url):

    # time.sleep(1)
    # 获取响应数据的过程是IO操作
    response = requests.get(movie_url)

    movie_dir = r'G:\正课\进程与线程\梨视频文件夹'
    movie_path = os.path.join(
        movie_dir, str(uuid.uuid4()) + '.mp4'
    )
    # print(movie_path)
    with open(movie_path, 'wb') as f:
        for line in response.iter_content():
            f.write(line)


if __name__ == '__main__':
    response = get_page('https://www.pearvideo.com/')
    id_list = parse_page(response)
    for id_num in id_list:
        # 每一个视频详情页
        url = f'https://www.pearvideo.com/video_{id_num}'

        # 异步提交并爬取详情页任务

        # add_done_callback(函数名) 表示的是回调函数

        # add_done_callback(parse_detail): 将get_page任务结束后的结果，立马扔给parse_detail函数。实现一边下载数据一边解析数据
        # parse_detail函数接收的是一个对象，对象中的result()就是get_page函数的返回值。
        pool.submit(get_page, url).add_done_callback(parse_detail)

    import datetime

    print(datetime.datetime.now())
    # 21:54 ---> 18:45

posted @ 2019-11-16 09:34 薛定谔的猫66 阅读(170) 评论(0) 收藏举报

刷新页面返回顶部

薛定谔的猫66