[forward] Quick Python zlib vs bz2 benchmark
// http://log.bthomson.com/2011/01/quick-python-gzip-vs-bz2-benchmark.html
Quick Python zlib vs bz2 benchmark
The test file was this plaintext book, a highly-compressible source. Columns are: level, time, bytes uncompressed, bytes compressed, ratio.
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
% ./bench.zsh zlib compress 0 6.98ms 640599 640700 1.000 1 21.22ms 640599 274195 2.336 2 25.08ms 640599 261638 2.448 3 34.24ms 640599 249649 2.566 4 36.41ms 640599 241500 2.653 5 54.24ms 640599 232545 2.755 6 77.22ms 640599 228621 2.802 7 87.94ms 640599 228032 2.809 8 112.49ms 640599 227622 2.814 9 113.03ms 640599 227622 2.814 zlib decompress 0 1.54ms 1 6.39ms 2 6.13ms 3 6.02ms 4 6.22ms 5 5.96ms 6 5.94ms 7 5.90ms 8 5.89ms 9 5.94ms bz2 compress 1 105.30ms 640599 196752 3.256 2 103.42ms 640599 186082 3.443 3 105.40ms 640599 180905 3.541 4 104.95ms 640599 177642 3.606 5 113.12ms 640599 176232 3.635 6 110.45ms 640599 173153 3.700 7 113.06ms 640599 169634 3.776 8 110.27ms 640599 169634 3.776 9 111.43ms 640599 169634 3.776 bz2 decompress 1 36.40ms 2 35.79ms 3 36.35ms 4 36.81ms 5 41.18ms 6 44.86ms 7 48.96ms 8 48.45ms 9 47.95ms |
Conclusion: probably not worth it. bz2 at level=4 takes about 7 times longer to decompress than gzip at level=9 for only a modest improvement in the compression ratio from 2.8 to 3.6.
Interestingly for write-heavy workloads bz2 may actually be the better choice since compression time is not much worse than gzip at level=9.
I think it's better not to use the timeit module for this kind of benchmark since in typical usage you will just be compressing/decompressing some given data once. If the operations speed up in repeat runs due to caching (and they do), that doesn't reflect typical usage. Starting a new python process for each test seems to reduce cache effects.
Anyway, here is the code.
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
import zlib import bz2 import time import sys level = int(sys.argv[1]) mod = zlib if int(sys.argv[2]) else bz2 is_decompress = int(sys.argv[3]) with open("pg4238.txt") as f: data = f.read() if is_decompress: c_data = mod.compress(data, level) t = time.time() if is_decompress: data = mod.decompress(c_data) else: c_data = mod.compress(data, level) print level, "%6.02fms" % (1000*(time.time() - t)), if not is_decompress: print len(data), len(c_data), "%.03f" % (float(len(data))/len(c_data)) |
|
1
2
3
4
5
6
7
8
9
|
#!/usr/bin/zsh echo 'zlib compress'for level in {0..9}; do python bench.py $level 1 0; doneecho '\nzlib decompress'for level in {0..9}; do python bench.py $level 1 1; doneecho '\nbz2 compress'for level in {1..9}; do python bench.py $level 0 0; doneecho '\nbz2 decompress'for level in {1..9}; do python bench.py $level 0 1; done |

浙公网安备 33010602011771号