再探 游戏 《 2048 》 —— AI方法—— 缘起、缘灭(7) —— Python版本实现的《2048》游戏的TDL算法
《2048》游戏在线试玩地址:
如何解决《2048》游戏源于外网的一个讨论帖子,而这个帖子则是讨论如何解决该游戏的最早开始,可谓是“缘起”:
What is the optimal algorithm for the game 2048?
关于该游戏的相关内容前面已经写过一些内容:
再探 游戏 《 2048 》 —— AI方法—— 缘起、缘灭(1) —— Firefox浏览器下自动运行游戏篇
===========================================
在网上发现了一个对《2048》游戏的TDL解法的一个C++版本实现,地址:https://github.com/moporgic/TDL2048-Demo
本文就是介绍根据这个实现用python语言重构,也就是重新实现的python版本的TDL算法。
改用python实现的代码地址:
https://gitee.com/devilmaycry812839668/tdl2048-python-demo
根据网友实现的《2048》游戏的TDL解法,使用python语言重写的,性能是难以与原版C++实现所比较的。本库意义在于代码逻辑的示范,并没有太多实际运行性能的价值。这里所使用的TDL算法是参考README中的文献所实现的,但是需要注意这里并没有严格实现,仅仅是实现了论文中部分的算法。
=============================================
原C++版本实现的运行结果:(地址:https://github.com/moporgic/TDL2048-Demo )






================================================
按照C++版本的逻辑和参数设置使用Python重写的代码:
python实现的代码地址:
https://gitee.com/devilmaycry812839668/tdl2048-python-demo
运行结果:
第一次:

90000 mean = 51467 max = 171436
128 100.0% ( 0.6%)
256 99.4% ( 0.5%)
512 98.9% ( 2.2%)
1024 96.7% (17.5%)
2048 79.2% (33.6%)
4096 45.6% (39.5%)
8192 6.1% ( 6.1%)
91000 mean = 52003 max = 175620
64 100.0% ( 0.1%)
128 99.9% ( 0.1%)
256 99.8% ( 0.5%)
512 99.3% ( 3.0%)
1024 96.3% (16.7%)
2048 79.6% (33.3%)
4096 46.3% (39.7%)
8192 6.6% ( 6.6%)
92000 mean = 51713 max = 179520
64 100.0% ( 0.1%)
256 99.9% ( 0.6%)
512 99.3% ( 3.4%)
1024 95.9% (16.3%)
2048 79.6% (33.0%)
4096 46.6% (40.3%)
8192 6.3% ( 6.3%)
93000 mean = 51767 max = 184252
64 100.0% ( 0.1%)
128 99.9% ( 0.3%)
256 99.6% ( 1.1%)
512 98.5% ( 3.6%)
1024 94.9% (16.2%)
2048 78.7% (32.7%)
4096 46.0% (39.2%)
8192 6.8% ( 6.8%)
94000 mean = 52040 max = 177072
64 100.0% ( 0.4%)
128 99.6% ( 0.3%)
256 99.3% ( 1.4%)
512 97.9% ( 4.0%)
1024 93.9% (16.9%)
2048 77.0% (30.6%)
4096 46.4% (38.7%)
8192 7.7% ( 7.7%)
95000 mean = 53317 max = 177396
32 100.0% ( 0.1%)
64 99.9% ( 0.2%)
128 99.7% ( 0.1%)
256 99.6% ( 0.5%)
512 99.1% ( 1.8%)
1024 97.3% (14.8%)
2048 82.5% (33.8%)
4096 48.7% (41.6%)
8192 7.1% ( 7.1%)
96000 mean = 55690 max = 174236
256 100.0% ( 0.4%)
512 99.6% ( 1.8%)
1024 97.8% (12.1%)
2048 85.7% (34.7%)
4096 51.0% (43.7%)
8192 7.3% ( 7.3%)
97000 mean = 52997 max = 177124
64 100.0% ( 0.1%)
128 99.9% ( 0.1%)
256 99.8% ( 1.0%)
512 98.8% ( 3.0%)
1024 95.8% (16.9%)
2048 78.9% (31.9%)
4096 47.0% (39.5%)
8192 7.5% ( 7.5%)
98000 mean = 52491 max = 178280
128 100.0% ( 0.6%)
256 99.4% ( 1.1%)
512 98.3% ( 2.8%)
1024 95.5% (15.3%)
2048 80.2% (33.1%)
4096 47.1% (40.0%)
8192 7.1% ( 7.1%)
99000 mean = 52579 max = 231680
128 100.0% ( 0.2%)
256 99.8% ( 1.5%)
512 98.3% ( 1.9%)
1024 96.4% (17.0%)
2048 79.4% (32.3%)
4096 47.1% (39.8%)
8192 7.3% ( 7.2%)
16384 0.1% ( 0.1%)
100000 mean = 54323 max = 176672
64 100.0% ( 0.2%)
256 99.8% ( 0.4%)
512 99.4% ( 2.1%)
1024 97.3% (15.5%)
2048 81.8% (31.6%)
4096 50.2% (42.8%)
8192 7.4% ( 7.4%)
real 299m7.984s
user 299m2.076s
sys 0m4.188s
第二次:

90000 mean = 51416 max = 169020
128 100.0% ( 0.3%)
256 99.7% ( 1.0%)
512 98.7% ( 3.9%)
1024 94.8% (17.8%)
2048 77.0% (30.5%)
4096 46.5% (39.9%)
8192 6.6% ( 6.6%)
91000 mean = 52111 max = 176432
32 100.0% ( 0.1%)
256 99.9% ( 0.7%)
512 99.2% ( 2.8%)
1024 96.4% (14.9%)
2048 81.5% (33.0%)
4096 48.5% (43.0%)
8192 5.5% ( 5.5%)
92000 mean = 52641 max = 175848
64 100.0% ( 0.1%)
128 99.9% ( 0.2%)
256 99.7% ( 0.7%)
512 99.0% ( 2.9%)
1024 96.1% (15.2%)
2048 80.9% (34.0%)
4096 46.9% (40.4%)
8192 6.5% ( 6.5%)
93000 mean = 53224 max = 177336
64 100.0% ( 0.1%)
128 99.9% ( 0.3%)
256 99.6% ( 0.9%)
512 98.7% ( 3.0%)
1024 95.7% (14.4%)
2048 81.3% (32.5%)
4096 48.8% (41.6%)
8192 7.2% ( 7.2%)
94000 mean = 53501 max = 181752
128 100.0% ( 0.2%)
256 99.8% ( 0.6%)
512 99.2% ( 2.3%)
1024 96.9% (15.5%)
2048 81.4% (32.1%)
4096 49.3% (42.9%)
8192 6.4% ( 6.4%)
95000 mean = 54450 max = 173708
64 100.0% ( 0.3%)
128 99.7% ( 0.2%)
256 99.5% ( 0.5%)
512 99.0% ( 2.8%)
1024 96.2% (15.3%)
2048 80.9% (29.9%)
4096 51.0% (43.8%)
8192 7.2% ( 7.2%)
96000 mean = 55262 max = 226556
32 100.0% ( 0.1%)
64 99.9% ( 0.2%)
128 99.7% ( 0.2%)
256 99.5% ( 0.4%)
512 99.1% ( 2.6%)
1024 96.5% (15.6%)
2048 80.9% (30.3%)
4096 50.6% (42.3%)
8192 8.3% ( 8.2%)
16384 0.1% ( 0.1%)
97000 mean = 53725 max = 177588
128 100.0% ( 0.1%)
256 99.9% ( 0.1%)
512 99.8% ( 3.4%)
1024 96.4% (14.9%)
2048 81.5% (32.4%)
4096 49.1% (42.0%)
8192 7.1% ( 7.1%)
98000 mean = 54402 max = 176952
64 100.0% ( 0.1%)
128 99.9% ( 0.2%)
256 99.7% ( 1.0%)
512 98.7% ( 2.4%)
1024 96.3% (15.6%)
2048 80.7% (31.6%)
4096 49.1% (40.9%)
8192 8.2% ( 8.2%)
99000 mean = 55552 max = 176284
64 100.0% ( 0.2%)
128 99.8% ( 0.2%)
256 99.6% ( 0.9%)
512 98.7% ( 2.2%)
1024 96.5% (14.7%)
2048 81.8% (29.5%)
4096 52.3% (44.5%)
8192 7.8% ( 7.8%)
100000 mean = 55576 max = 174112
64 100.0% ( 0.1%)
128 99.9% ( 0.2%)
256 99.7% ( 0.6%)
512 99.1% ( 1.4%)
1024 97.7% (15.5%)
2048 82.2% (31.2%)
4096 51.0% (43.1%)
8192 7.9% ( 7.9%)
real 306m48.606s
user 306m42.380s
sys 0m4.352s
第三次:

90000 mean = 48991 max = 158096 parameter mean = 1.5322926465899345 min = -12611.286509352407 max = 5502.92779377629 128 100.0% ( 0.1%) 256 99.9% ( 0.9%) 512 99.0% ( 4.1%) 1024 94.9% (15.4%) 2048 79.5% (36.0%) 4096 43.5% (38.2%) 8192 5.3% ( 5.3%) 91000 mean = 49475 max = 175724 parameter mean = 1.5583559250936798 min = -12611.286509352407 max = 5561.9897442909605 128 100.0% ( 0.2%) 256 99.8% ( 1.0%) 512 98.8% ( 3.7%) 1024 95.1% (15.7%) 2048 79.4% (34.6%) 4096 44.8% (40.2%) 8192 4.6% ( 4.6%) 92000 mean = 47784 max = 162568 parameter mean = 1.5773082791168676 min = -12611.286509352407 max = 5678.766629401056 32 100.0% ( 0.1%) 64 99.9% ( 0.2%) 128 99.7% ( 0.2%) 256 99.5% ( 1.3%) 512 98.2% ( 3.5%) 1024 94.7% (18.3%) 2048 76.4% (36.3%) 4096 40.1% (34.3%) 8192 5.8% ( 5.8%) 93000 mean = 50350 max = 156552 parameter mean = 1.6007116787981626 min = -12611.286509352407 max = 5777.117118806266 128 100.0% ( 0.1%) 256 99.9% ( 0.9%) 512 99.0% ( 3.8%) 1024 95.2% (16.1%) 2048 79.1% (33.4%) 4096 45.7% (41.2%) 8192 4.5% ( 4.5%) 94000 mean = 50930 max = 162044 parameter mean = 1.624377595843311 min = -12611.286509352407 max = 5785.769207357467 128 100.0% ( 0.1%) 256 99.9% ( 0.8%) 512 99.1% ( 2.6%) 1024 96.5% (15.9%) 2048 80.6% (35.1%) 4096 45.5% (39.8%) 8192 5.7% ( 5.7%) 95000 mean = 51721 max = 176832 parameter mean = 1.644080028640483 min = -12611.286509352407 max = 5738.999075391805 128 100.0% ( 0.2%) 256 99.8% ( 0.5%) 512 99.3% ( 3.3%) 1024 96.0% (15.3%) 2048 80.7% (33.7%) 4096 47.0% (39.5%) 8192 7.5% ( 7.5%) 96000 mean = 51399 max = 163788 parameter mean = 1.6663415442828855 min = -12611.286509352407 max = 5845.513382853591 128 100.0% ( 0.3%) 256 99.7% ( 0.5%) 512 99.2% ( 2.5%) 1024 96.7% (16.1%) 2048 80.6% (34.8%) 4096 45.8% (39.7%) 8192 6.1% ( 6.1%) 97000 mean = 51230 max = 166544 parameter mean = 1.686732530926021 min = -12611.286509352407 max = 5774.039565288521 128 100.0% ( 0.2%) 256 99.8% ( 0.5%) 512 99.3% ( 2.9%) 1024 96.4% (15.1%) 2048 81.3% (34.7%) 4096 46.6% (40.9%) 8192 5.7% ( 5.7%) 98000 mean = 52740 max = 160900 parameter mean = 1.7095415726931256 min = -12563.946177760263 max = 5843.970362728986 128 100.0% ( 0.2%) 256 99.8% ( 0.5%) 512 99.3% ( 2.8%) 1024 96.5% (13.5%) 2048 83.0% (33.7%) 4096 49.3% (44.0%) 8192 5.3% ( 5.3%) 99000 mean = 52669 max = 169220 parameter mean = 1.7307740083579728 min = -12476.152646821502 max = 5989.542236493472 64 100.0% ( 0.1%) 128 99.9% ( 0.2%) 256 99.7% ( 0.5%) 512 99.2% ( 3.6%) 1024 95.6% (16.2%) 2048 79.4% (30.9%) 4096 48.5% (41.7%) 8192 6.8% ( 6.8%) 100000 mean = 50804 max = 177192 parameter mean = 1.7455929446279508 min = -12425.416359450286 max = 5934.462230068022 128 100.0% ( 0.2%) 256 99.8% ( 0.8%) 512 99.0% ( 3.2%) 1024 95.8% (16.5%) 2048 79.3% (33.0%) 4096 46.3% (41.1%) 8192 5.2% ( 5.2%) real 296m31.332s user 296m26.503s sys 0m3.752s
第四次:

90000 mean = 52461 max = 173552 parameter mean = 1.6053481665698048 min = -7208.176952683409 max = 6427.410230296034 64 100.0% ( 0.1%) 128 99.9% ( 0.1%) 256 99.8% ( 0.7%) 512 99.1% ( 2.3%) 1024 96.8% (15.1%) 2048 81.7% (33.0%) 4096 48.7% (42.5%) 8192 6.2% ( 6.2%) 91000 mean = 55241 max = 178784 parameter mean = 1.6274640457672958 min = -7259.0587871490425 max = 6477.305533296316 256 100.0% ( 0.5%) 512 99.5% ( 2.6%) 1024 96.9% (13.5%) 2048 83.4% (33.3%) 4096 50.1% (42.8%) 8192 7.3% ( 7.3%) 92000 mean = 52429 max = 177220 parameter mean = 1.6318069277652643 min = -7355.596804147162 max = 6134.214002995824 64 100.0% ( 0.1%) 128 99.9% ( 0.3%) 256 99.6% ( 0.8%) 512 98.8% ( 3.4%) 1024 95.4% (13.0%) 2048 82.4% (34.8%) 4096 47.6% (41.7%) 8192 5.9% ( 5.9%) 93000 mean = 53196 max = 176908 parameter mean = 1.6571834176438884 min = -7395.123183171931 max = 6442.175016066894 128 100.0% ( 0.1%) 256 99.9% ( 0.4%) 512 99.5% ( 3.5%) 1024 96.0% (13.4%) 2048 82.6% (32.6%) 4096 50.0% (44.1%) 8192 5.9% ( 5.9%) 94000 mean = 54382 max = 173860 parameter mean = 1.675727360442215 min = -7541.959140000994 max = 6517.354366059809 64 100.0% ( 0.1%) 128 99.9% ( 0.1%) 256 99.8% ( 1.3%) 512 98.5% ( 2.7%) 1024 95.8% (12.9%) 2048 82.9% (33.4%) 4096 49.5% (41.6%) 8192 7.9% ( 7.9%) 95000 mean = 43978 max = 154944 parameter mean = 1.6698778814922732 min = -7572.789701277566 max = 6304.535102266191 64 100.0% ( 0.1%) 128 99.9% ( 0.9%) 256 99.0% ( 1.9%) 512 97.1% ( 7.9%) 1024 89.2% (19.2%) 2048 70.0% (32.4%) 4096 37.6% (32.7%) 8192 4.9% ( 4.9%) 96000 mean = 54040 max = 160380 parameter mean = 1.6912278785732446 min = -7719.388457403322 max = 6510.281642123573 128 100.0% ( 0.3%) 256 99.7% ( 0.4%) 512 99.3% ( 3.2%) 1024 96.1% (12.8%) 2048 83.3% (32.1%) 4096 51.2% (44.6%) 8192 6.6% ( 6.6%) 97000 mean = 54919 max = 168684 parameter mean = 1.707997573418896 min = -7894.983875587813 max = 6280.380637503087 64 100.0% ( 0.1%) 128 99.9% ( 0.2%) 256 99.7% ( 0.8%) 512 98.9% ( 2.5%) 1024 96.4% (12.7%) 2048 83.7% (34.6%) 4096 49.1% (40.5%) 8192 8.6% ( 8.6%) 98000 mean = 54004 max = 177120 parameter mean = 1.7254613897551536 min = -7987.297183766859 max = 6452.3736882617795 64 100.0% ( 0.1%) 128 99.9% ( 0.5%) 256 99.4% ( 0.6%) 512 98.8% ( 2.6%) 1024 96.2% (13.8%) 2048 82.4% (31.6%) 4096 50.8% (43.0%) 8192 7.8% ( 7.8%) 99000 mean = 57217 max = 177216 parameter mean = 1.7490571072457777 min = -8006.657356429137 max = 6541.148384830181 128 100.0% ( 0.1%) 256 99.9% ( 0.6%) 512 99.3% ( 1.5%) 1024 97.8% (12.0%) 2048 85.8% (31.9%) 4096 53.9% (46.0%) 8192 7.9% ( 7.9%) 100000 mean = 54750 max = 176264 parameter mean = 1.764116240567329 min = -8017.693119600374 max = 6523.890065745666 256 100.0% ( 0.9%) 512 99.1% ( 1.5%) 1024 97.6% (15.3%) 2048 82.3% (32.6%) 4096 49.7% (41.9%) 8192 7.8% ( 7.8%) real 316m59.817s user 316m55.019s sys 0m3.848s
第五次:

90000 mean = 52291 max = 179004 parameter mean = 1.577609627851082 min = -6410.535830712946 max = 6268.392610115872 32 100.0% ( 0.1%) 64 99.9% ( 0.2%) 128 99.7% ( 0.2%) 256 99.5% ( 0.5%) 512 99.0% ( 2.5%) 1024 96.5% (16.7%) 2048 79.8% (33.5%) 4096 46.3% (39.2%) 8192 7.1% ( 7.1%) 91000 mean = 50069 max = 177416 parameter mean = 1.58602477127092 min = -6416.8667830648565 max = 6116.279207138224 32 100.0% ( 0.1%) 64 99.9% ( 0.1%) 128 99.8% ( 0.2%) 256 99.6% ( 1.3%) 512 98.3% ( 3.4%) 1024 94.9% (15.7%) 2048 79.2% (34.8%) 4096 44.4% (37.9%) 8192 6.5% ( 6.5%) 92000 mean = 51593 max = 173324 parameter mean = 1.607900775327343 min = -6564.512670668733 max = 6032.862918894087 128 100.0% ( 0.3%) 256 99.7% ( 1.0%) 512 98.7% ( 3.0%) 1024 95.7% (18.2%) 2048 77.5% (32.4%) 4096 45.1% (37.3%) 8192 7.8% ( 7.8%) 93000 mean = 51460 max = 176452 parameter mean = 1.6219107678292823 min = -6678.20840805842 max = 6023.411176000316 128 100.0% ( 0.4%) 256 99.6% ( 0.7%) 512 98.9% ( 3.6%) 1024 95.3% (15.1%) 2048 80.2% (34.0%) 4096 46.2% (39.6%) 8192 6.6% ( 6.6%) 94000 mean = 52758 max = 173096 parameter mean = 1.6468634622310205 min = -6747.213382051944 max = 6295.77740079765 64 100.0% ( 0.2%) 256 99.8% ( 0.4%) 512 99.4% ( 3.2%) 1024 96.2% (15.7%) 2048 80.5% (32.1%) 4096 48.4% (41.9%) 8192 6.5% ( 6.5%) 95000 mean = 47877 max = 182596 parameter mean = 1.6551993956598714 min = -6748.069998331208 max = 6276.521022738931 128 100.0% ( 0.7%) 256 99.3% ( 0.7%) 512 98.6% ( 5.7%) 1024 92.9% (18.3%) 2048 74.6% (32.0%) 4096 42.6% (37.5%) 8192 5.1% ( 5.1%) 96000 mean = 52882 max = 182724 parameter mean = 1.6750486373931794 min = -6784.023377349516 max = 6130.883932820633 256 100.0% ( 0.6%) 512 99.4% ( 1.8%) 1024 97.6% (16.6%) 2048 81.0% (34.6%) 4096 46.4% (38.8%) 8192 7.6% ( 7.6%) 97000 mean = 52465 max = 181272 parameter mean = 1.6954002504581211 min = -6944.510544426434 max = 6366.379778545424 128 100.0% ( 0.4%) 256 99.6% ( 0.8%) 512 98.8% ( 3.4%) 1024 95.4% (13.9%) 2048 81.5% (33.9%) 4096 47.6% (40.7%) 8192 6.9% ( 6.9%) 98000 mean = 54046 max = 180240 parameter mean = 1.7180765658270194 min = -6851.223610376787 max = 6475.389397704871 128 100.0% ( 0.1%) 256 99.9% ( 0.1%) 512 99.8% ( 2.8%) 1024 97.0% (17.3%) 2048 79.7% (29.2%) 4096 50.5% (43.0%) 8192 7.5% ( 7.5%) 99000 mean = 54522 max = 178304 parameter mean = 1.7391720879705754 min = -6844.314693502977 max = 6668.951758086201 256 100.0% ( 0.3%) 512 99.7% ( 2.7%) 1024 97.0% (15.0%) 2048 82.0% (31.6%) 4096 50.4% (42.5%) 8192 7.9% ( 7.9%) 100000 mean = 53481 max = 171572 parameter mean = 1.756473951551881 min = -6983.806168335259 max = 6474.6341022093275 64 100.0% ( 0.1%) 256 99.9% ( 1.0%) 512 98.9% ( 3.1%) 1024 95.8% (15.8%) 2048 80.0% (34.1%) 4096 45.9% (37.2%) 8192 8.7% ( 8.7%) real 299m16.055s user 299m11.208s sys 0m3.960s
第六次:

90000 mean = 52543 max = 174820 parameter mean = 1.6059263026057993 min = -6550.597576404195 max = 6261.313263951085 64 100.0% ( 0.2%) 128 99.8% ( 0.2%) 256 99.6% ( 0.5%) 512 99.1% ( 2.7%) 1024 96.4% (16.3%) 2048 80.1% (31.8%) 4096 48.3% (42.5%) 8192 5.8% ( 5.8%) 91000 mean = 55358 max = 175220 parameter mean = 1.6251463892495772 min = -6702.8250139300135 max = 6310.60832493676 256 100.0% ( 0.7%) 512 99.3% ( 2.7%) 1024 96.6% (14.0%) 2048 82.6% (31.7%) 4096 50.9% (42.6%) 8192 8.3% ( 8.3%) 92000 mean = 56273 max = 236784 parameter mean = 1.6486751777321125 min = -6906.108255742565 max = 6371.227224105752 128 100.0% ( 0.2%) 256 99.8% ( 0.3%) 512 99.5% ( 2.3%) 1024 97.2% (14.0%) 2048 83.2% (30.0%) 4096 53.2% (45.4%) 8192 7.8% ( 7.7%) 16384 0.1% ( 0.1%) 93000 mean = 53268 max = 220724 parameter mean = 1.660701492927984 min = -7052.671176869955 max = 6368.542040877368 128 100.0% ( 0.2%) 256 99.8% ( 0.8%) 512 99.0% ( 4.3%) 1024 94.7% (14.4%) 2048 80.3% (30.3%) 4096 50.0% (43.9%) 8192 6.1% ( 6.0%) 16384 0.1% ( 0.1%) 94000 mean = 53880 max = 174016 parameter mean = 1.6784254950564135 min = -7083.17129906737 max = 6476.117469214218 64 100.0% ( 0.2%) 128 99.8% ( 0.1%) 256 99.7% ( 0.5%) 512 99.2% ( 3.1%) 1024 96.1% (14.7%) 2048 81.4% (31.7%) 4096 49.7% (42.0%) 8192 7.7% ( 7.7%) 95000 mean = 54367 max = 183256 parameter mean = 1.6994809013610876 min = -7219.06370794949 max = 6534.607678601115 64 100.0% ( 0.2%) 256 99.8% ( 0.5%) 512 99.3% ( 2.3%) 1024 97.0% (16.6%) 2048 80.4% (30.1%) 4096 50.3% (41.9%) 8192 8.4% ( 8.4%) 96000 mean = 54792 max = 176636 parameter mean = 1.7165451311948388 min = -7323.116142364778 max = 6542.238758733516 64 100.0% ( 0.1%) 256 99.9% ( 0.8%) 512 99.1% ( 2.2%) 1024 96.9% (14.1%) 2048 82.8% (33.2%) 4096 49.6% (40.9%) 8192 8.7% ( 8.7%) 97000 mean = 47666 max = 177616 parameter mean = 1.716929942156352 min = -7406.573279954264 max = 6257.871826491511 64 100.0% ( 0.4%) 128 99.6% ( 0.3%) 256 99.3% ( 1.8%) 512 97.5% ( 3.7%) 1024 93.8% (17.9%) 2048 75.9% (35.4%) 4096 40.5% (35.5%) 8192 5.0% ( 5.0%) 98000 mean = 54499 max = 177628 parameter mean = 1.743440598860856 min = -7603.90460458944 max = 6365.018691973732 128 100.0% ( 0.3%) 256 99.7% ( 0.7%) 512 99.0% ( 3.1%) 1024 95.9% (14.2%) 2048 81.7% (32.1%) 4096 49.6% (41.2%) 8192 8.4% ( 8.4%) 99000 mean = 55059 max = 178140 parameter mean = 1.7682309894805486 min = -7638.215075618973 max = 6565.107432410588 32 100.0% ( 0.1%) 64 99.9% ( 0.1%) 128 99.8% ( 0.2%) 256 99.6% ( 0.5%) 512 99.1% ( 2.3%) 1024 96.8% (16.2%) 2048 80.6% (30.2%) 4096 50.4% (42.4%) 8192 8.0% ( 8.0%) 100000 mean = 55167 max = 169020 parameter mean = 1.7892723967535384 min = -7792.923437823282 max = 6686.816646242725 256 100.0% ( 0.1%) 512 99.9% ( 2.8%) 1024 97.1% (15.3%) 2048 81.8% (31.8%) 4096 50.0% (42.3%) 8192 7.7% ( 7.7%) real 319m53.779s user 319m48.061s sys 0m4.412s
关于n-tuple network 的参数统计分析,可以看到进行100000 episodes的训练后参数的均值没有太大的变化,但是参数的最大、最小值则为正几千和负几千,这个最大参数和最小参数的差距可以说是十分巨大的。由于之前也做过其他TDL的复现:
再探 游戏 《 2048 》 —— AI方法—— 缘起、缘灭(5) —— 第一个用于解决2048游戏的Reinforcement learning方法——《Temporal Difference Learning of N-Tuple Networks for the Game 2048》
代码:https://gitee.com/devilmaycry812839668/td-tuple-net-for-2048
而之前的复现效果不好的很大原因就是参数会出现上下溢出的问题,而这次的这个复现虽然也依然出现参数最大、最小值差距较大的情况,但是由于这个游戏episode较长(几千甚至几万步)也是属于正常的。之所以之前的复现中参数出现溢出而这次的复现参数上下限在可控范围,个人的看法是这个主要的改进原因在于这次复现中将一个特征进行了旋转对调后形成了8个同态特征,而这8个同态特征都是共用一个lookup表的,这样每次在update的时候对一个lookup表的修改都会相对的分散而不是那么集中,于是就避免了在早些训练过程中类似棋盘状态频繁出现导致的对某些lookup表中数据进行过多的update,这样也就避免了lookup表出现参数值溢出的问题。换句话说,一个特征变换出8个共享一个lookup特征表的同态特征才是TDL算法有效的关键。
这里的TDL算法其实就是TD(0)算法,或者我们也可以把它看做是一种Q-learning的同形态算法,由于《2048》游戏中状态的数据表示为:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]这样的类型,而这样的状态特征又难以高效的使用神经网络,因此对于《2048》游戏来说SOTA的解法就是使用N-Tuple Network来进行游戏状态的数值表示。
注意:
关于《2048》游戏的TDL算法细节需要参考论文:《Temporal Difference Learning of N-Tuple Networks for the Game 2048》
=====================================================
从C++版本和Python版本的运行结果上来看,这次的实现还是很成功的,可以说是完全保证了算法运行逻辑、参数设置等的一致,并且结果也是相当的,唯一的不同就是运行的最终时间消耗。原始的C++版本运行完需要60分钟,也就是一个小时的时间,而我们这里实现的python版本需要运行300分钟,也就是五个小时的时间,可以看到总的运行时间变成了5倍,不过考虑到python语言的特性,这个运行时间也是完全可以接受的。虽然C++版本可以达到Python版本五分之一的用时,但是C++版本确实看起来不好理解,即使是我也只是能做到看懂C++代码,而难以流畅的编写C++代码。虽然python版本难以用于现实的应用,不过这个版本的实现还是可以有一定参考价值的,毕竟这也是我至今网上唯一可以找到的python版本实现的《2048》游戏的TDL算法,也正是因为网上没有才自己用python语言写了一遍。
=========================================
posted on 2022-08-22 22:35 Angry_Panda 阅读(487) 评论(0) 收藏 举报
浙公网安备 33010602011771号