python 两种多线程比较，有时多线程有可能变成累赘

首先我是为了把这56w左右的数据清洗

变成这样：

从一个txt清洗，写到另一个txt中。原本是几千条数据，一直用的普通的，速度还挺快，今天想清洗这56w数据，就想到了多线程。

第一种方法：

def huoqu(file):
    ts_queue = Queue(10000)
    with open(file, 'r')as f:
        t = f.read()
        IP = t.split('\n')
        for i in IP:
            ts_queue.put(i)
        return ts_queue
def qingxi(ts_queue):
    while not ts_queue.empty():
        i = ts_queue.get()
        port_1 = re.findall(r"W12.*", i)
        port_1 = ''.join(port_1)
        try:
            t = zidian.zi_dian()
            port = str(t[port_1])
        except:
            port = '9999'
        port_2 = re.findall(r"/common.*", i)
        port_2 = ''.join(port_2)
        IP = i.replace(port_2, port)
        with open('IP3.txt', 'a+')as g:
            g.write(IP)
            g.write('\n')

with open('IP.txt','r')as f:
    t= f.read()
    IP = t.split('\n')
    heji = []
    for i in IP:
        port_1 = re.findall(r"W12.*", i)
        port_1 = ''.join(port_1)
        try:
            t = zidian.zi_dian()
            port = str(t[port_1])
        except:
            port = '9999'
        port_2 = re.findall(r"/common.*", i)
        port_2 = ''.join(port_2)
        IP = i.replace(port_2, port)
        heji.append(IP)
        #print(IP)
    heji.pop()
    for i in heji:
        with open('IP2.txt', 'a+')as g:
            g.write(i)
            g.write('\n')


if __name__ == "__main__":

    start = datetime.datetime.now().replace(microsecond=0)
    print('开始————————读取列表：')
    t = 'IP.txt'
    s = huoqu(t)
    threads = []
    for i in range(100):
        t = threading.Thread(target=qingxi, name='th-' + str(i), kwargs={'ts_queue': s})
        threads.append(t)
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    end = datetime.datetime.now().replace(microsecond=0)
    print('删除耗时：' + str(end - start))

这是平常我最喜欢用的多线程方法，非阻塞式得，速度最快的，但今天卡死了，不动了，原因

return ts_queue      
需要添加56w的数据加入，来进行线程运作，当时不行了。
不添加线程还能运作，就是很慢，这个完全不工作了、

想到了换下一种，并且运用了自己带函数  map（）  来提高效率。

：

def square(x):
    port_1 = re.findall(r"W12.*", x)
    port_1 = ''.join(port_1)
    try:
        t = zidian.zi_dian()
        port = str(t[port_1])
    except:
        port = '9999'
    port_2 = re.findall(r"/common.*", x)
    port_2 = ''.join(port_2)
    IP = x.replace(port_2, port)
    return IP
def main():
    with open('IP.txt', 'r')as f:
        t = f.read()
        IP = t.split('\n')
        IP.pop()
        res = map(square, IP)
        t_list = []
        for ip_port in res:
            t = threading.Thread(target=is_enable, args=(ip_port,))
            t.start()
            t_list.append(t)
        for t in t_list:
            t.join()


def is_enable(ip_port):
    with open('IP3.txt', 'a+')as g:
        g.write(ip_port)
        g.write('\n')

if __name__ == '__main__':
    start = datetime.datetime.now().replace(microsecond=0)
    main()
    end = datetime.datetime.now().replace(microsecond=0)
    print('删除耗时：' + str(end - start))
    #删除耗时：0:05:14

并换上了另一种快速的多线程，清洗用内置函数完成，写入文件用多线程，但居然用了5分多种，多了几个for循环，大大拉低了速度，这就说明这个完全没必要用多线程，还拉低了速度。

这时候看下不用多线程的。：

def square(x):
    port_1 = re.findall(r"W12.*", x)
    port_1 = ''.join(port_1)
    try:
        t = zidian.zi_dian()
        port = str(t[port_1])
    except:
        port = '9999'
    port_2 = re.findall(r"/common.*", x)
    port_2 = ''.join(port_2)
    IP = x.replace(port_2, port)
    return IP
start = datetime.datetime.now().replace(microsecond=0)
with open('IP.txt', 'r')as f:
    t = f.read()
    IP = t.split('\n')
    IP.pop()
    res = map(square, IP)
    for i in res:
        with open('IP3.txt', 'a+')as g:
            g.write(i)
            g.write('\n')
    end = datetime.datetime.now().replace(microsecond=0)
    print('删除耗时：' + str(end - start))
    # 删除耗时：0:03:52

明显快多了，只用4分钟左右，显然for 循环在56w数据面前，大大拉低了速度，耗费了时间，所以两种多线程个有优点，当数据过大，写入文件不如不用多线程。

要想加快，可以把列表分成几个，单独给每个列表写入文件，但顺序会发生变化，更加吃电脑配置了。

posted @ 2020-12-13 15:13 凹凸曼大人阅读(286) 评论(0) 收藏举报

刷新页面返回顶部

凹凸曼大人

python 两种多线程比较，有时多线程有可能变成累赘

公告