Python可变数据类型list填坑一则

前提概要

最近写业务代码时遇到一个列表的坑，在此记录一下。

需求

现在有一个普通的rule列表：

rule = [["ID",">",0]]

在其他地方经过计算得到一个id_lst的列表：

id_lst = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]

使用小范围的数模拟一下真实的业务场景：我需要再在前面的rule的基础上新加另外一个规则列表，但是id_lst的长度不能超过5。

还是看最终拼接的效果吧：

ret = [
　　　　  [['ID', '>', 0], ['ID', 'in', [1, 2, 3, 4, 5]]], 
         [['ID', '>', 0], ['ID', 'in', [6, 7, 8, 9, 10]]], 
　　      [['ID', '>', 0], ['ID', 'in', [11, 12, 13, 14, 15]]], 
         [['ID', '>', 0], ['ID', 'in', [16, 17, 18, 19,20]]]
　　　　]

也就是说，我需要吧id_lst中的数按照5位基数进行分割，放在新的规则列表中，然后与之前的rule组成新的规则...以此类推，把“新组装成的规则”再放到新的列表ret中。

—— 抛开业务场景，这个问题其实可以模拟成一个小问题：条件是rule与id_lst，需要得到ret那种样式的结果。

错误的解决方法

看到这种问题，那肯定得遍历呗，所以第一版年轻的程序诞生了：

# -*- coding:utf-8 -*-

rule = [["ID",">",0]]
id_lst = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
append_lst = []
ret = []

for i in range(len(id_lst)//5 + 1):
    ll = id_lst[i*5:(i+1)*5]
    # print("ll:",ll)
    append_rule = []
    if len(ll) != 0:
        append_rule = ["ID","in",ll]
    append_lst.append(append_rule)

print("append_lst:",append_lst)
# [['ID', 'in', [1, 2, 3, 4, 5]], ['ID', 'in', [6, 7, 8, 9, 10]], ['ID', 'in', [11, 12, 13, 14, 15]], ['ID', 'in', [16, 17, 18, 19, 20]], []]

if len(append_lst) == 1:
    rule += append_lst
    ret.append(rule)
else:
    for i in append_lst:
        # 排除空值的干扰
        if i == []:
            append_lst.pop(append_lst.index(i))
            continue
        rule.append(i)
        ret.append(rule)
        rule.pop()
print(">>>>>ret:",ret)
# [[['ID', '>', 0]], [['ID', '>', 0]], [['ID', '>', 0]], [['ID', '>', 0]]]

结果竟然不是自己想的辣样！

当时我还以为是for循环遍历的“姿势”不对，于是乎试图使用“堆栈”的方式解决：

# -*- coding:utf-8 -*-
rule = [["ID",">",0]]
id_lst = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
append_lst = []
ret = []

for i in range(len(id_lst)//5 + 1):
    ll = id_lst[i*5:(i+1)*5]
    # print("ll:",ll)
    append_rule = []
    if len(ll) != 0:
        append_rule = ["ID","in",ll]
    append_lst.append(append_rule)

print("append_lst:",append_lst)
# [['ID', 'in', [1, 2, 3, 4, 5]], ['ID', 'in', [6, 7, 8, 9, 10]], ['ID', 'in', [11, 12, 13, 14, 15]], ['ID', 'in', [16, 17, 18, 19, 20]], []]
# 堆栈的方式解决
while append_lst:
    a_l = append_lst.pop(0)
    # 排除空列表做一下判断
    if a_l:
        rule.append(a_l)
        ret.append(rule)
        rule.pop()

print(">>>>>ret:",ret)
# [[['ID', '>', 0]], [['ID', '>', 0]], [['ID', '>', 0]], [['ID', '>', 0]]]

问题分析

其实细看下来，用for循环遍历与用堆栈的方式解决的思路是一样的：遍历的每一次把append_lst中的每一个列表元素加到rule列表中，然后把新得到的这个rule列表加在ret中！最后因为我们要“重复利用rule列表，”为了防止“重复”数据，将之前append到rule列表中的元素pop出去，然后再进行新一轮的操作。

于是乎自己用Pycharm来debug了一下，终于发现了问题：原来每一次append到ret列表中的那个rule列表，跟原来的rule列表其实是同一个列表！（惊不惊喜，意不意外0-0）—— ret.append(rule)这个操作应该只是将rule列表的引用传递进去了，最后那个rule.pop()操作改变了rule本身的值，同样也改变了ret中之前append进去的rule！

下面是我debug的过程，大家仔细看倒数第二行ret的值的变化过程：

问题解决

既然放在ret中的rule列表与原rule列表是一样的，那么可以使用deepcopy将临时生成的列表放在ret中，这样在rule.pop()后就不会改变之前append到ret中的那个列表了：

# -*- coding:utf-8 -*-
from copy import deepcopy


rule = [["ID",">",0]]
id_lst = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
append_lst = []
ret = []

for i in range(len(id_lst)//5 + 1):
    ll = id_lst[i*5:(i+1)*5]
    # print("ll:",ll)
    append_rule = []
    if len(ll) != 0:
        append_rule = ["ID","in",ll]
    append_lst.append(append_rule)

print("append_lst:",append_lst)
# [['ID', 'in', [1, 2, 3, 4, 5]], ['ID', 'in', [6, 7, 8, 9, 10]], ['ID', 'in', [11, 12, 13, 14, 15]], ['ID', 'in', [16, 17, 18, 19]]]
# append_lst等于1不用切割，这里为了后期容易维护我把不分割与分割的情况分开写了
if len(append_lst) == 1:
    rule += append_lst
    ret.append(rule)
else:
    for i in append_lst:
        if i == []:
            append_lst.pop(append_lst.index(i))
            continue
        rule.append(i)
        current_lst = deepcopy(rule)
        ret.append(current_lst)
        rule.pop()
print(">>>>>ret:",ret)
# [[['ID', '>', 0], ['ID', 'in', [1, 2, 3, 4, 5]]], [['ID', '>', 0], ['ID', 'in', [6, 7, 8, 9, 10]]], [['ID', '>', 0], ['ID', 'in', [11, 12, 13, 14, 15]]], [['ID', '>', 0], ['ID', 'in', [16, 17, 18, 19, 20]]]]
for i in ret:
    print(i)
"""
[['ID', '>', 0], ['ID', 'in', [1, 2, 3, 4, 5]]]
[['ID', '>', 0], ['ID', 'in', [6, 7, 8, 9, 10]]]
[['ID', '>', 0], ['ID', 'in', [11, 12, 13, 14, 15]]]
[['ID', '>', 0], ['ID', 'in', [16, 17, 18, 19, 20]]]
"""

存在问题

关于Python中赋值与拷贝问题我之前总结过一篇博客：Python3中的赋值操作、浅拷贝与深拷贝

也许经验丰富的你从上面的代码中也看出了问题：如果实际中id_lst中的数据非常大（几十万甚至更多），而且我们以万为基数进行分割的话，deepcopy出来的这个current_lst会占用比较大的空间！

如果聪明的你有更好的解决方式的话，欢迎在下方留言，一起交流探讨。

posted on 2019-12-17 15:53 江湖乄夜雨阅读(592) 评论(2) 编辑收藏举报