诚意
诚意如你,当一诚的态度对待

导航

 

一:进程

1:创建多进程

方法一:常用的

 1 import multiprocessing
 2 
 3 #方式一
 4 
 5 def task(arg):
 6 
 7 print(arg)
 8 
 9 if__name__=='__main__':
10 
11  
12 
13 for i in range(10):
14 
15 p=multiprocessing.Process(target=task,args=(i,))
16 
17 p.start()
18 
19  
20 
21 #方式二:
22 
23 def task(arg):
24 
25 print(arg)
26 
27  
28 
29  
30 
31 def run():
32 
33 for i in range(10):
34 
35 p=multiprocessing.Process(target=task,args=(i,))
36 
37 p.start()
38 
39 if __name__=='__main__':
40 
41 run()
View Code

 

方法2:面向对象(继承

 1 import multiprocessing
 2 
 3   class MyProcess(multiprocessing.Process):
 4 
 5     def run(self):
 6 
 7         print('当前进程',multiprocessing.current_process())
 8 
 9   
10 
11   def run():
12 
13     p1 = MyProcess()
14 
15     p1.start()
16 
17   
18 
19     p2 = MyProcess()
20 
21     p2.start()
22 
23   
24 
25   if __name__ == '__main__':
26 
27     run()
28 
29   # 当前进程 <MyProcess(MyProcess-1, started)>
30 
31 # 当前进程 <MyProcess(MyProcess-2, started)>
View Code 

 

2:进程之间数据不共享

案例:

 1 import multiprocessing
 2 data_list = []
 3 
 4 def task(arg):
 5     data_list.append(arg)
 6     print(data_list)
 7 
 8 def run():
 9     for  i in range(3):
10         p = multiprocessing.Process(target=task,args=(i,))
11         p.start()
12 
13 if __name__ == '__main__':
14     run()
15 
16 # [0]
17 # [1]
18 # [2]
View Code

 

 

3:常用方法

join/deamon/name/multiprocessing.current_process()/multiprocessing.current_process().ident/pid

 1 import time
 2 import multiprocessing
 3 def task(arg):
 4     p = multiprocessing.current_process() #
 5     print(p.name)  #获取进程名
 6     print(p.ident)  #获取进程id
 7     print(p.pid)  #获取进程id
 8 
 9     time.sleep(2)
10     print(arg)
11 
12 
13 def run():
14     print('111111111')
15     p1 = multiprocessing.Process(target=task,args=(1,))
16     p1.name = 'pp1'  #设置进程名
17     # p1.daemon = True  # 让主线程不再等待子线程
18     p1.start()
19 
20     print('222222222')
21     # p1.join()让主线程等待子线程
22 
23  
View Code

 

 

二:进程间的数据共享

进程之间的数据是不共享的,那么如何才能使得进程之间的数据共享呢?

 

方式一:Queue队列

 

 1 import multiprocessing
 2 
 3   import threading
 4 
 5   import queue
 6 
 7   
 8 
 9   q = multiprocessing.Queue()
10 
11   
12 
13   def task(arg,q):
14 
15     q.put(arg)
16 
17   
18 
19   def run():
20 
21     for i in range(10):
22 
23         p = multiprocessing.Process(target=task,args=(i,q,))
24 
25         p.start()
26 
27   
28 
29     while True:
30 
31         v = q.get()
32 
33         print(v)
34 
35   if __name__ == '__main__':  #windows系统里需要加上此句,否则报错,Linux不用
36 
37     run()
View Code

 

 

方式二:Manager 字典

Linux

 1 import multiprocessing
 2 
 3 m = multiprocessing.Manager()
 4 dic = m.dict()
 5 
 6 def task(arg):
 7     dic[arg] = 100
 8 
 9 def run():
10     for i in range(10):
11         p = multiprocessing.Process(target=task,args=(i))
12         p.start()
13 
14     input('>>>')
15     print(dic.values())
16 
17  
View Code

windows

方式一:

 1 import time
 2 import multiprocessing
 3 
 4 def task(arg,dic):
 5     dic[arg] = 100
 6 
 7 if __name__ == '__main__':
 8     m = multiprocessing.Manager()
 9     dic = m.dict()
10     for i in range(10):
11         p = multiprocessing.Process(target=task, args=(i,dic))
12         p.start()
13         p.join()
14 
15     print(dic.values())
16 
17 经过多次实际验证,主进程执行完毕后,子进程还在运行,那么,如果子进程用主进程的数据,由于主进程已经关闭,
18 所以就会找不就会报错。如何解决此报错问题呢?第一种方法就是让主进程等待所有子进程运行完毕之后在关闭,
19 第二种方法是就是在主进程中有一个循序,循环中判断进程存活的数量,如果子进程全关了,就退出循环,这样主进程就可以关闭了

 

 

方式二:

 1 import time
 2 import multiprocessing
 3 
 4 def task(arg,dic):
 5     dic[arg] = 100
 6 if __name__ == '__main__':
 7     m = multiprocessing.Manager()
 8     dic=m.dict()
 9 
10     process_list = []
11     for i in range(5):
12         p = multiprocessing.Process(target=task,args=(i,dic,))
13         p.start()
14 
15         process_list.append(p)
16 
17     while True:
18         count = 0
19         for p in process_list:
20             if not p.is_alive():  #判断子进程是否存活
21                 count+=1
22         if count == len(process_list):
23             break
24     print(dic)
25 
26  

 

 

方式三:数据共享更多用这种方式(伪代码)

 

def task(arg,dic):

    pass

 

if __name__ == '__main__':

    while True:

        # 连接上指定的服务器

        # 去机器上获取url

        url = 'adfasdf'

        p = multiprocessing.Process(target=task, args=(url,))

        p.start()

 

 

连接的获取方式:

 

 

 

 

三:进程锁

1:为什么要有进程锁?

因为进程在共享数据时,需要安全机制

 

2:案例:

 1 import time
 2 import threading
 3 import multiprocessing
 4 
 5 lock = multiprocessing.RLock()
 6 def task(arg):
 7     print('鬼子来了')
 8     lock.acquire()
 9     time.sleep(2)
10     print(arg)
11     lock.release()
12 
13 
14 if __name__ == '__main__':
15     p1 = multiprocessing.Process(target=task,args=(1,))
16     p1.start()
17 
18     p2 = multiprocessing.Process(target=task, args=(2,))
19     p2.start()
View Code 

 

四:进程池

案例:

 1 import time
 2 from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor
 3 def task(arg):
 4     time.sleep(2)
 5     print(arg)
 6 
 7 if __name__ == '__main__':
 8     pool = ProcessPoolExecutor(5)
 9     for i in range(10):
10         pool.submit(task,i)

 

五:爬虫模块(request/beautiful)

 

1:安装的方式

方式一:终端安装

 

pip3 install requests

pip3 install beautifulsoup4

 

安装时遇到问题:

(1)找不到内部指令?

       方式一:绝对路径安装

                                   C:\Users\Administrator\AppData\Local\Programs\Python\Python36\Scripts\pip3  install requests

       方式二:把此路径加入到sys.path里面

                                   C:\Users\Administrator\AppData\Local\Programs\Python\Python36\Scripts

                                  

               pip3  install requests

方式二:pycharm安装

 

 

2:requests模块

1:作用:

模拟浏览器向服务器发请求

2:本质:

  -创建socket客户端

  -连接

  -发送请求

  -接收请求

  -断开连接

 

3:案例:

 1 import requests
 2 # 模拟浏览器发送请求
 3 
 4 # 内部创建 sk = socket.socket()
 5 
 6 # 和抽屉进行socket连接 sk.connect(...)
 7 
 8 # sk.sendall('...')
 9 
10 # sk.recv(...)
11 r1 = requests.get(
12     url='https://dig.chouti.com/',
13     headers={
14             'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36'
15         }
16 )
17 print(r1.text)
View Code 

 

3:bs4

作用:以前爬取了网页需要正则来截取出来,而bs4就代替正则把数据截取出来

 

案例:

 1 import requests
 2 from bs4 import BeautifulSoup
 3 from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor
 4 
 5 
 6 # 模拟浏览器发送请求
 7 # 内部创建 sk = socket.socket()
 8 # 和抽屉进行socket连接 sk.connect(...)
 9 # sk.sendall('...')
10 # sk.recv(...)
11 
12 def task(url):
13     print(url)
14     r1 = requests.get(
15         url=url,
16         headers={
17             'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36'
18         }
19     )
20 
21     # 查看下载下来的文本信息
22     soup = BeautifulSoup(r1.text,'html.parser')  #html.parser:截取的格式
23     print(soup.text)    #content输出的是字节
24     content_list = soup.find('div',attrs={'id':'content-list'})
25     for item in content_list.find_all('div',attrs={'class':'item'}):
26         title = item.find('a').text.strip()
27         target_url = item.find('a').get('href')
28         print(title,target_url)
29 
30 def run():
31     pool = ThreadPoolExecutor(5)
32     for i in range(1,50):
33         pool.submit(task,'https://dig.chouti.com/all/hot/recent/%s' %i)
34 
35 
36 if __name__ == '__main__':
37     run()
38 
39  
View Code

 

posted on 2018-09-12 21:56  诚意  阅读(126)  评论(0)    收藏  举报