再探 游戏 《 2048 》 —— AI方法—— 缘起、缘灭(7) —— Python版本实现的《2048》游戏的TDL算法

《2048》游戏在线试玩地址:

https://play2048.co/

 

 

如何解决《2048》游戏源于外网的一个讨论帖子,而这个帖子则是讨论如何解决该游戏的最早开始,可谓是“缘起”:

What is the optimal algorithm for the game 2048?

 

 

关于该游戏的相关内容前面已经写过一些内容:

再探 游戏 《 2048 》 —— AI方法—— 缘起、缘灭(1) —— Firefox浏览器下自动运行游戏篇

 

 

 

===========================================

 

 

 

在网上发现了一个对《2048》游戏的TDL解法的一个C++版本实现,地址:https://github.com/moporgic/TDL2048-Demo

本文就是介绍根据这个实现用python语言重构,也就是重新实现的python版本的TDL算法。

改用python实现的代码地址:

https://gitee.com/devilmaycry812839668/tdl2048-python-demo

 

根据网友实现的《2048》游戏的TDL解法,使用python语言重写的,性能是难以与原版C++实现所比较的。本库意义在于代码逻辑的示范,并没有太多实际运行性能的价值。这里所使用的TDL算法是参考README中的文献所实现的,但是需要注意这里并没有严格实现,仅仅是实现了论文中部分的算法。

 

 

=============================================

 

 

C++版本实现的运行结果:(地址:https://github.com/moporgic/TDL2048-Demo

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

================================================

 

 

 

按照C++版本的逻辑和参数设置使用Python重写的代码:

python实现的代码地址:

https://gitee.com/devilmaycry812839668/tdl2048-python-demo

 

 

运行结果:

第一次:

 

90000      mean = 51467      max = 171436
     128        100.0%      ( 0.6%)
     256         99.4%      ( 0.5%)
     512         98.9%      ( 2.2%)
     1024         96.7%      (17.5%)
     2048         79.2%      (33.6%)
     4096         45.6%      (39.5%)
     8192          6.1%      ( 6.1%)
91000      mean = 52003      max = 175620
     64        100.0%      ( 0.1%)
     128         99.9%      ( 0.1%)
     256         99.8%      ( 0.5%)
     512         99.3%      ( 3.0%)
     1024         96.3%      (16.7%)
     2048         79.6%      (33.3%)
     4096         46.3%      (39.7%)
     8192          6.6%      ( 6.6%)
92000      mean = 51713      max = 179520
     64        100.0%      ( 0.1%)
     256         99.9%      ( 0.6%)
     512         99.3%      ( 3.4%)
     1024         95.9%      (16.3%)
     2048         79.6%      (33.0%)
     4096         46.6%      (40.3%)
     8192          6.3%      ( 6.3%)
93000      mean = 51767      max = 184252
     64        100.0%      ( 0.1%)
     128         99.9%      ( 0.3%)
     256         99.6%      ( 1.1%)
     512         98.5%      ( 3.6%)
     1024         94.9%      (16.2%)
     2048         78.7%      (32.7%)
     4096         46.0%      (39.2%)
     8192          6.8%      ( 6.8%)
94000      mean = 52040      max = 177072
     64        100.0%      ( 0.4%)
     128         99.6%      ( 0.3%)
     256         99.3%      ( 1.4%)
     512         97.9%      ( 4.0%)
     1024         93.9%      (16.9%)
     2048         77.0%      (30.6%)
     4096         46.4%      (38.7%)
     8192          7.7%      ( 7.7%)
95000      mean = 53317      max = 177396
     32        100.0%      ( 0.1%)
     64         99.9%      ( 0.2%)
     128         99.7%      ( 0.1%)
     256         99.6%      ( 0.5%)
     512         99.1%      ( 1.8%)
     1024         97.3%      (14.8%)
     2048         82.5%      (33.8%)
     4096         48.7%      (41.6%)
     8192          7.1%      ( 7.1%)
96000      mean = 55690      max = 174236
     256        100.0%      ( 0.4%)
     512         99.6%      ( 1.8%)
     1024         97.8%      (12.1%)
     2048         85.7%      (34.7%)
     4096         51.0%      (43.7%)
     8192          7.3%      ( 7.3%)
97000      mean = 52997      max = 177124
     64        100.0%      ( 0.1%)
     128         99.9%      ( 0.1%)
     256         99.8%      ( 1.0%)
     512         98.8%      ( 3.0%)
     1024         95.8%      (16.9%)
     2048         78.9%      (31.9%)
     4096         47.0%      (39.5%)
     8192          7.5%      ( 7.5%)
98000      mean = 52491      max = 178280
     128        100.0%      ( 0.6%)
     256         99.4%      ( 1.1%)
     512         98.3%      ( 2.8%)
     1024         95.5%      (15.3%)
     2048         80.2%      (33.1%)
     4096         47.1%      (40.0%)
     8192          7.1%      ( 7.1%)
99000      mean = 52579      max = 231680
     128        100.0%      ( 0.2%)
     256         99.8%      ( 1.5%)
     512         98.3%      ( 1.9%)
     1024         96.4%      (17.0%)
     2048         79.4%      (32.3%)
     4096         47.1%      (39.8%)
     8192          7.3%      ( 7.2%)
     16384          0.1%      ( 0.1%)
100000      mean = 54323      max = 176672
     64        100.0%      ( 0.2%)
     256         99.8%      ( 0.4%)
     512         99.4%      ( 2.1%)
     1024         97.3%      (15.5%)
     2048         81.8%      (31.6%)
     4096         50.2%      (42.8%)
     8192          7.4%      ( 7.4%)

real    299m7.984s
user    299m2.076s
sys    0m4.188s
View Code

 

 

 

 

第二次:

 

90000      mean = 51416      max = 169020
     128        100.0%      ( 0.3%)
     256         99.7%      ( 1.0%)
     512         98.7%      ( 3.9%)
     1024         94.8%      (17.8%)
     2048         77.0%      (30.5%)
     4096         46.5%      (39.9%)
     8192          6.6%      ( 6.6%)
91000      mean = 52111      max = 176432
     32        100.0%      ( 0.1%)
     256         99.9%      ( 0.7%)
     512         99.2%      ( 2.8%)
     1024         96.4%      (14.9%)
     2048         81.5%      (33.0%)
     4096         48.5%      (43.0%)
     8192          5.5%      ( 5.5%)
92000      mean = 52641      max = 175848
     64        100.0%      ( 0.1%)
     128         99.9%      ( 0.2%)
     256         99.7%      ( 0.7%)
     512         99.0%      ( 2.9%)
     1024         96.1%      (15.2%)
     2048         80.9%      (34.0%)
     4096         46.9%      (40.4%)
     8192          6.5%      ( 6.5%)
93000      mean = 53224      max = 177336
     64        100.0%      ( 0.1%)
     128         99.9%      ( 0.3%)
     256         99.6%      ( 0.9%)
     512         98.7%      ( 3.0%)
     1024         95.7%      (14.4%)
     2048         81.3%      (32.5%)
     4096         48.8%      (41.6%)
     8192          7.2%      ( 7.2%)
94000      mean = 53501      max = 181752
     128        100.0%      ( 0.2%)
     256         99.8%      ( 0.6%)
     512         99.2%      ( 2.3%)
     1024         96.9%      (15.5%)
     2048         81.4%      (32.1%)
     4096         49.3%      (42.9%)
     8192          6.4%      ( 6.4%)
95000      mean = 54450      max = 173708
     64        100.0%      ( 0.3%)
     128         99.7%      ( 0.2%)
     256         99.5%      ( 0.5%)
     512         99.0%      ( 2.8%)
     1024         96.2%      (15.3%)
     2048         80.9%      (29.9%)
     4096         51.0%      (43.8%)
     8192          7.2%      ( 7.2%)
96000      mean = 55262      max = 226556
     32        100.0%      ( 0.1%)
     64         99.9%      ( 0.2%)
     128         99.7%      ( 0.2%)
     256         99.5%      ( 0.4%)
     512         99.1%      ( 2.6%)
     1024         96.5%      (15.6%)
     2048         80.9%      (30.3%)
     4096         50.6%      (42.3%)
     8192          8.3%      ( 8.2%)
     16384          0.1%      ( 0.1%)
97000      mean = 53725      max = 177588
     128        100.0%      ( 0.1%)
     256         99.9%      ( 0.1%)
     512         99.8%      ( 3.4%)
     1024         96.4%      (14.9%)
     2048         81.5%      (32.4%)
     4096         49.1%      (42.0%)
     8192          7.1%      ( 7.1%)
98000      mean = 54402      max = 176952
     64        100.0%      ( 0.1%)
     128         99.9%      ( 0.2%)
     256         99.7%      ( 1.0%)
     512         98.7%      ( 2.4%)
     1024         96.3%      (15.6%)
     2048         80.7%      (31.6%)
     4096         49.1%      (40.9%)
     8192          8.2%      ( 8.2%)
99000      mean = 55552      max = 176284
     64        100.0%      ( 0.2%)
     128         99.8%      ( 0.2%)
     256         99.6%      ( 0.9%)
     512         98.7%      ( 2.2%)
     1024         96.5%      (14.7%)
     2048         81.8%      (29.5%)
     4096         52.3%      (44.5%)
     8192          7.8%      ( 7.8%)
100000      mean = 55576      max = 174112
     64        100.0%      ( 0.1%)
     128         99.9%      ( 0.2%)
     256         99.7%      ( 0.6%)
     512         99.1%      ( 1.4%)
     1024         97.7%      (15.5%)
     2048         82.2%      (31.2%)
     4096         51.0%      (43.1%)
     8192          7.9%      ( 7.9%)

real    306m48.606s
user    306m42.380s
sys    0m4.352s
View Code

 

 

 

 

第三次:

 

 

90000      mean = 48991      max = 158096
parameter      mean = 1.5322926465899345      min = -12611.286509352407      max = 5502.92779377629
          128        100.0%      ( 0.1%)
          256         99.9%      ( 0.9%)
          512         99.0%      ( 4.1%)
         1024         94.9%      (15.4%)
         2048         79.5%      (36.0%)
         4096         43.5%      (38.2%)
         8192          5.3%      ( 5.3%)
91000      mean = 49475      max = 175724
parameter      mean = 1.5583559250936798      min = -12611.286509352407      max = 5561.9897442909605
          128        100.0%      ( 0.2%)
          256         99.8%      ( 1.0%)
          512         98.8%      ( 3.7%)
         1024         95.1%      (15.7%)
         2048         79.4%      (34.6%)
         4096         44.8%      (40.2%)
         8192          4.6%      ( 4.6%)
92000      mean = 47784      max = 162568
parameter      mean = 1.5773082791168676      min = -12611.286509352407      max = 5678.766629401056
           32        100.0%      ( 0.1%)
           64         99.9%      ( 0.2%)
          128         99.7%      ( 0.2%)
          256         99.5%      ( 1.3%)
          512         98.2%      ( 3.5%)
         1024         94.7%      (18.3%)
         2048         76.4%      (36.3%)
         4096         40.1%      (34.3%)
         8192          5.8%      ( 5.8%)
93000      mean = 50350      max = 156552
parameter      mean = 1.6007116787981626      min = -12611.286509352407      max = 5777.117118806266
          128        100.0%      ( 0.1%)
          256         99.9%      ( 0.9%)
          512         99.0%      ( 3.8%)
         1024         95.2%      (16.1%)
         2048         79.1%      (33.4%)
         4096         45.7%      (41.2%)
         8192          4.5%      ( 4.5%)
94000      mean = 50930      max = 162044
parameter      mean = 1.624377595843311      min = -12611.286509352407      max = 5785.769207357467
          128        100.0%      ( 0.1%)
          256         99.9%      ( 0.8%)
          512         99.1%      ( 2.6%)
         1024         96.5%      (15.9%)
         2048         80.6%      (35.1%)
         4096         45.5%      (39.8%)
         8192          5.7%      ( 5.7%)
95000      mean = 51721      max = 176832
parameter      mean = 1.644080028640483      min = -12611.286509352407      max = 5738.999075391805
          128        100.0%      ( 0.2%)
          256         99.8%      ( 0.5%)
          512         99.3%      ( 3.3%)
         1024         96.0%      (15.3%)
         2048         80.7%      (33.7%)
         4096         47.0%      (39.5%)
         8192          7.5%      ( 7.5%)
96000      mean = 51399      max = 163788
parameter      mean = 1.6663415442828855      min = -12611.286509352407      max = 5845.513382853591
          128        100.0%      ( 0.3%)
          256         99.7%      ( 0.5%)
          512         99.2%      ( 2.5%)
         1024         96.7%      (16.1%)
         2048         80.6%      (34.8%)
         4096         45.8%      (39.7%)
         8192          6.1%      ( 6.1%)
97000      mean = 51230      max = 166544
parameter      mean = 1.686732530926021      min = -12611.286509352407      max = 5774.039565288521
          128        100.0%      ( 0.2%)
          256         99.8%      ( 0.5%)
          512         99.3%      ( 2.9%)
         1024         96.4%      (15.1%)
         2048         81.3%      (34.7%)
         4096         46.6%      (40.9%)
         8192          5.7%      ( 5.7%)
98000      mean = 52740      max = 160900
parameter      mean = 1.7095415726931256      min = -12563.946177760263      max = 5843.970362728986
          128        100.0%      ( 0.2%)
          256         99.8%      ( 0.5%)
          512         99.3%      ( 2.8%)
         1024         96.5%      (13.5%)
         2048         83.0%      (33.7%)
         4096         49.3%      (44.0%)
         8192          5.3%      ( 5.3%)
99000      mean = 52669      max = 169220
parameter      mean = 1.7307740083579728      min = -12476.152646821502      max = 5989.542236493472
           64        100.0%      ( 0.1%)
          128         99.9%      ( 0.2%)
          256         99.7%      ( 0.5%)
          512         99.2%      ( 3.6%)
         1024         95.6%      (16.2%)
         2048         79.4%      (30.9%)
         4096         48.5%      (41.7%)
         8192          6.8%      ( 6.8%)
100000      mean = 50804      max = 177192
parameter      mean = 1.7455929446279508      min = -12425.416359450286      max = 5934.462230068022
          128        100.0%      ( 0.2%)
          256         99.8%      ( 0.8%)
          512         99.0%      ( 3.2%)
         1024         95.8%      (16.5%)
         2048         79.3%      (33.0%)
         4096         46.3%      (41.1%)
         8192          5.2%      ( 5.2%)

real    296m31.332s
user    296m26.503s
sys    0m3.752s
View Code

 

 

 

 

 

 

第四次:

 

 

 

90000      mean = 52461      max = 173552
parameter      mean = 1.6053481665698048      min = -7208.176952683409      max = 6427.410230296034
           64        100.0%      ( 0.1%)
          128         99.9%      ( 0.1%)
          256         99.8%      ( 0.7%)
          512         99.1%      ( 2.3%)
         1024         96.8%      (15.1%)
         2048         81.7%      (33.0%)
         4096         48.7%      (42.5%)
         8192          6.2%      ( 6.2%)
91000      mean = 55241      max = 178784
parameter      mean = 1.6274640457672958      min = -7259.0587871490425      max = 6477.305533296316
          256        100.0%      ( 0.5%)
          512         99.5%      ( 2.6%)
         1024         96.9%      (13.5%)
         2048         83.4%      (33.3%)
         4096         50.1%      (42.8%)
         8192          7.3%      ( 7.3%)
92000      mean = 52429      max = 177220
parameter      mean = 1.6318069277652643      min = -7355.596804147162      max = 6134.214002995824
           64        100.0%      ( 0.1%)
          128         99.9%      ( 0.3%)
          256         99.6%      ( 0.8%)
          512         98.8%      ( 3.4%)
         1024         95.4%      (13.0%)
         2048         82.4%      (34.8%)
         4096         47.6%      (41.7%)
         8192          5.9%      ( 5.9%)
93000      mean = 53196      max = 176908
parameter      mean = 1.6571834176438884      min = -7395.123183171931      max = 6442.175016066894
          128        100.0%      ( 0.1%)
          256         99.9%      ( 0.4%)
          512         99.5%      ( 3.5%)
         1024         96.0%      (13.4%)
         2048         82.6%      (32.6%)
         4096         50.0%      (44.1%)
         8192          5.9%      ( 5.9%)
94000      mean = 54382      max = 173860
parameter      mean = 1.675727360442215      min = -7541.959140000994      max = 6517.354366059809
           64        100.0%      ( 0.1%)
          128         99.9%      ( 0.1%)
          256         99.8%      ( 1.3%)
          512         98.5%      ( 2.7%)
         1024         95.8%      (12.9%)
         2048         82.9%      (33.4%)
         4096         49.5%      (41.6%)
         8192          7.9%      ( 7.9%)
95000      mean = 43978      max = 154944
parameter      mean = 1.6698778814922732      min = -7572.789701277566      max = 6304.535102266191
           64        100.0%      ( 0.1%)
          128         99.9%      ( 0.9%)
          256         99.0%      ( 1.9%)
          512         97.1%      ( 7.9%)
         1024         89.2%      (19.2%)
         2048         70.0%      (32.4%)
         4096         37.6%      (32.7%)
         8192          4.9%      ( 4.9%)
96000      mean = 54040      max = 160380
parameter      mean = 1.6912278785732446      min = -7719.388457403322      max = 6510.281642123573
          128        100.0%      ( 0.3%)
          256         99.7%      ( 0.4%)
          512         99.3%      ( 3.2%)
         1024         96.1%      (12.8%)
         2048         83.3%      (32.1%)
         4096         51.2%      (44.6%)
         8192          6.6%      ( 6.6%)
97000      mean = 54919      max = 168684
parameter      mean = 1.707997573418896      min = -7894.983875587813      max = 6280.380637503087
           64        100.0%      ( 0.1%)
          128         99.9%      ( 0.2%)
          256         99.7%      ( 0.8%)
          512         98.9%      ( 2.5%)
         1024         96.4%      (12.7%)
         2048         83.7%      (34.6%)
         4096         49.1%      (40.5%)
         8192          8.6%      ( 8.6%)
98000      mean = 54004      max = 177120
parameter      mean = 1.7254613897551536      min = -7987.297183766859      max = 6452.3736882617795
           64        100.0%      ( 0.1%)
          128         99.9%      ( 0.5%)
          256         99.4%      ( 0.6%)
          512         98.8%      ( 2.6%)
         1024         96.2%      (13.8%)
         2048         82.4%      (31.6%)
         4096         50.8%      (43.0%)
         8192          7.8%      ( 7.8%)
99000      mean = 57217      max = 177216
parameter      mean = 1.7490571072457777      min = -8006.657356429137      max = 6541.148384830181
          128        100.0%      ( 0.1%)
          256         99.9%      ( 0.6%)
          512         99.3%      ( 1.5%)
         1024         97.8%      (12.0%)
         2048         85.8%      (31.9%)
         4096         53.9%      (46.0%)
         8192          7.9%      ( 7.9%)
100000      mean = 54750      max = 176264
parameter      mean = 1.764116240567329      min = -8017.693119600374      max = 6523.890065745666
          256        100.0%      ( 0.9%)
          512         99.1%      ( 1.5%)
         1024         97.6%      (15.3%)
         2048         82.3%      (32.6%)
         4096         49.7%      (41.9%)
         8192          7.8%      ( 7.8%)

real    316m59.817s
user    316m55.019s
sys    0m3.848s
View Code

 

 

 

 

 

第五次:

 

 

90000      mean = 52291      max = 179004
parameter      mean = 1.577609627851082      min = -6410.535830712946      max = 6268.392610115872
           32        100.0%      ( 0.1%)
           64         99.9%      ( 0.2%)
          128         99.7%      ( 0.2%)
          256         99.5%      ( 0.5%)
          512         99.0%      ( 2.5%)
         1024         96.5%      (16.7%)
         2048         79.8%      (33.5%)
         4096         46.3%      (39.2%)
         8192          7.1%      ( 7.1%)
91000      mean = 50069      max = 177416
parameter      mean = 1.58602477127092      min = -6416.8667830648565      max = 6116.279207138224
           32        100.0%      ( 0.1%)
           64         99.9%      ( 0.1%)
          128         99.8%      ( 0.2%)
          256         99.6%      ( 1.3%)
          512         98.3%      ( 3.4%)
         1024         94.9%      (15.7%)
         2048         79.2%      (34.8%)
         4096         44.4%      (37.9%)
         8192          6.5%      ( 6.5%)
92000      mean = 51593      max = 173324
parameter      mean = 1.607900775327343      min = -6564.512670668733      max = 6032.862918894087
          128        100.0%      ( 0.3%)
          256         99.7%      ( 1.0%)
          512         98.7%      ( 3.0%)
         1024         95.7%      (18.2%)
         2048         77.5%      (32.4%)
         4096         45.1%      (37.3%)
         8192          7.8%      ( 7.8%)
93000      mean = 51460      max = 176452
parameter      mean = 1.6219107678292823      min = -6678.20840805842      max = 6023.411176000316
          128        100.0%      ( 0.4%)
          256         99.6%      ( 0.7%)
          512         98.9%      ( 3.6%)
         1024         95.3%      (15.1%)
         2048         80.2%      (34.0%)
         4096         46.2%      (39.6%)
         8192          6.6%      ( 6.6%)
94000      mean = 52758      max = 173096
parameter      mean = 1.6468634622310205      min = -6747.213382051944      max = 6295.77740079765
           64        100.0%      ( 0.2%)
          256         99.8%      ( 0.4%)
          512         99.4%      ( 3.2%)
         1024         96.2%      (15.7%)
         2048         80.5%      (32.1%)
         4096         48.4%      (41.9%)
         8192          6.5%      ( 6.5%)
95000      mean = 47877      max = 182596
parameter      mean = 1.6551993956598714      min = -6748.069998331208      max = 6276.521022738931
          128        100.0%      ( 0.7%)
          256         99.3%      ( 0.7%)
          512         98.6%      ( 5.7%)
         1024         92.9%      (18.3%)
         2048         74.6%      (32.0%)
         4096         42.6%      (37.5%)
         8192          5.1%      ( 5.1%)
96000      mean = 52882      max = 182724
parameter      mean = 1.6750486373931794      min = -6784.023377349516      max = 6130.883932820633
          256        100.0%      ( 0.6%)
          512         99.4%      ( 1.8%)
         1024         97.6%      (16.6%)
         2048         81.0%      (34.6%)
         4096         46.4%      (38.8%)
         8192          7.6%      ( 7.6%)
97000      mean = 52465      max = 181272
parameter      mean = 1.6954002504581211      min = -6944.510544426434      max = 6366.379778545424
          128        100.0%      ( 0.4%)
          256         99.6%      ( 0.8%)
          512         98.8%      ( 3.4%)
         1024         95.4%      (13.9%)
         2048         81.5%      (33.9%)
         4096         47.6%      (40.7%)
         8192          6.9%      ( 6.9%)
98000      mean = 54046      max = 180240
parameter      mean = 1.7180765658270194      min = -6851.223610376787      max = 6475.389397704871
          128        100.0%      ( 0.1%)
          256         99.9%      ( 0.1%)
          512         99.8%      ( 2.8%)
         1024         97.0%      (17.3%)
         2048         79.7%      (29.2%)
         4096         50.5%      (43.0%)
         8192          7.5%      ( 7.5%)
99000      mean = 54522      max = 178304
parameter      mean = 1.7391720879705754      min = -6844.314693502977      max = 6668.951758086201
          256        100.0%      ( 0.3%)
          512         99.7%      ( 2.7%)
         1024         97.0%      (15.0%)
         2048         82.0%      (31.6%)
         4096         50.4%      (42.5%)
         8192          7.9%      ( 7.9%)
100000      mean = 53481      max = 171572
parameter      mean = 1.756473951551881      min = -6983.806168335259      max = 6474.6341022093275
           64        100.0%      ( 0.1%)
          256         99.9%      ( 1.0%)
          512         98.9%      ( 3.1%)
         1024         95.8%      (15.8%)
         2048         80.0%      (34.1%)
         4096         45.9%      (37.2%)
         8192          8.7%      ( 8.7%)

real    299m16.055s
user    299m11.208s
sys    0m3.960s
View Code

 

 

 

 

 

第六次:

 

 

90000      mean = 52543      max = 174820
parameter      mean = 1.6059263026057993      min = -6550.597576404195      max = 6261.313263951085
           64        100.0%      ( 0.2%)
          128         99.8%      ( 0.2%)
          256         99.6%      ( 0.5%)
          512         99.1%      ( 2.7%)
         1024         96.4%      (16.3%)
         2048         80.1%      (31.8%)
         4096         48.3%      (42.5%)
         8192          5.8%      ( 5.8%)
91000      mean = 55358      max = 175220
parameter      mean = 1.6251463892495772      min = -6702.8250139300135      max = 6310.60832493676
          256        100.0%      ( 0.7%)
          512         99.3%      ( 2.7%)
         1024         96.6%      (14.0%)
         2048         82.6%      (31.7%)
         4096         50.9%      (42.6%)
         8192          8.3%      ( 8.3%)
92000      mean = 56273      max = 236784
parameter      mean = 1.6486751777321125      min = -6906.108255742565      max = 6371.227224105752
          128        100.0%      ( 0.2%)
          256         99.8%      ( 0.3%)
          512         99.5%      ( 2.3%)
         1024         97.2%      (14.0%)
         2048         83.2%      (30.0%)
         4096         53.2%      (45.4%)
         8192          7.8%      ( 7.7%)
        16384          0.1%      ( 0.1%)
93000      mean = 53268      max = 220724
parameter      mean = 1.660701492927984      min = -7052.671176869955      max = 6368.542040877368
          128        100.0%      ( 0.2%)
          256         99.8%      ( 0.8%)
          512         99.0%      ( 4.3%)
         1024         94.7%      (14.4%)
         2048         80.3%      (30.3%)
         4096         50.0%      (43.9%)
         8192          6.1%      ( 6.0%)
        16384          0.1%      ( 0.1%)
94000      mean = 53880      max = 174016
parameter      mean = 1.6784254950564135      min = -7083.17129906737      max = 6476.117469214218
           64        100.0%      ( 0.2%)
          128         99.8%      ( 0.1%)
          256         99.7%      ( 0.5%)
          512         99.2%      ( 3.1%)
         1024         96.1%      (14.7%)
         2048         81.4%      (31.7%)
         4096         49.7%      (42.0%)
         8192          7.7%      ( 7.7%)
95000      mean = 54367      max = 183256
parameter      mean = 1.6994809013610876      min = -7219.06370794949      max = 6534.607678601115
           64        100.0%      ( 0.2%)
          256         99.8%      ( 0.5%)
          512         99.3%      ( 2.3%)
         1024         97.0%      (16.6%)
         2048         80.4%      (30.1%)
         4096         50.3%      (41.9%)
         8192          8.4%      ( 8.4%)
96000      mean = 54792      max = 176636
parameter      mean = 1.7165451311948388      min = -7323.116142364778      max = 6542.238758733516
           64        100.0%      ( 0.1%)
          256         99.9%      ( 0.8%)
          512         99.1%      ( 2.2%)
         1024         96.9%      (14.1%)
         2048         82.8%      (33.2%)
         4096         49.6%      (40.9%)
         8192          8.7%      ( 8.7%)
97000      mean = 47666      max = 177616
parameter      mean = 1.716929942156352      min = -7406.573279954264      max = 6257.871826491511
           64        100.0%      ( 0.4%)
          128         99.6%      ( 0.3%)
          256         99.3%      ( 1.8%)
          512         97.5%      ( 3.7%)
         1024         93.8%      (17.9%)
         2048         75.9%      (35.4%)
         4096         40.5%      (35.5%)
         8192          5.0%      ( 5.0%)
98000      mean = 54499      max = 177628
parameter      mean = 1.743440598860856      min = -7603.90460458944      max = 6365.018691973732
          128        100.0%      ( 0.3%)
          256         99.7%      ( 0.7%)
          512         99.0%      ( 3.1%)
         1024         95.9%      (14.2%)
         2048         81.7%      (32.1%)
         4096         49.6%      (41.2%)
         8192          8.4%      ( 8.4%)
99000      mean = 55059      max = 178140
parameter      mean = 1.7682309894805486      min = -7638.215075618973      max = 6565.107432410588
           32        100.0%      ( 0.1%)
           64         99.9%      ( 0.1%)
          128         99.8%      ( 0.2%)
          256         99.6%      ( 0.5%)
          512         99.1%      ( 2.3%)
         1024         96.8%      (16.2%)
         2048         80.6%      (30.2%)
         4096         50.4%      (42.4%)
         8192          8.0%      ( 8.0%)
100000      mean = 55167      max = 169020
parameter      mean = 1.7892723967535384      min = -7792.923437823282      max = 6686.816646242725
          256        100.0%      ( 0.1%)
          512         99.9%      ( 2.8%)
         1024         97.1%      (15.3%)
         2048         81.8%      (31.8%)
         4096         50.0%      (42.3%)
         8192          7.7%      ( 7.7%)

real    319m53.779s
user    319m48.061s
sys    0m4.412s
View Code

 

 

 

关于n-tuple network 的参数统计分析,可以看到进行100000 episodes的训练后参数的均值没有太大的变化,但是参数的最大、最小值则为正几千和负几千,这个最大参数和最小参数的差距可以说是十分巨大的。由于之前也做过其他TDL的复现:

再探 游戏 《 2048 》 —— AI方法—— 缘起、缘灭(5) —— 第一个用于解决2048游戏的Reinforcement learning方法——《Temporal Difference Learning of N-Tuple Networks for the Game 2048》

代码:https://gitee.com/devilmaycry812839668/td-tuple-net-for-2048

 

而之前的复现效果不好的很大原因就是参数会出现上下溢出的问题,而这次的这个复现虽然也依然出现参数最大、最小值差距较大的情况,但是由于这个游戏episode较长(几千甚至几万步)也是属于正常的。之所以之前的复现中参数出现溢出而这次的复现参数上下限在可控范围,个人的看法是这个主要的改进原因在于这次复现中将一个特征进行了旋转对调后形成了8个同态特征,而这8个同态特征都是共用一个lookup表的,这样每次在update的时候对一个lookup表的修改都会相对的分散而不是那么集中,于是就避免了在早些训练过程中类似棋盘状态频繁出现导致的对某些lookup表中数据进行过多的update,这样也就避免了lookup表出现参数值溢出的问题。换句话说,一个特征变换出8个共享一个lookup特征表的同态特征才是TDL算法有效的关键。

 

 

这里的TDL算法其实就是TD(0)算法,或者我们也可以把它看做是一种Q-learning的同形态算法,由于《2048》游戏中状态的数据表示为:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]这样的类型,而这样的状态特征又难以高效的使用神经网络,因此对于《2048》游戏来说SOTA的解法就是使用N-Tuple Network来进行游戏状态的数值表示。

 

 

 

注意:

关于《2048》游戏的TDL算法细节需要参考论文:《Temporal Difference Learning of N-Tuple Networks for the Game 2048》

 

 

 

=====================================================

 

 

从C++版本和Python版本的运行结果上来看,这次的实现还是很成功的,可以说是完全保证了算法运行逻辑、参数设置等的一致,并且结果也是相当的,唯一的不同就是运行的最终时间消耗。原始的C++版本运行完需要60分钟,也就是一个小时的时间,而我们这里实现的python版本需要运行300分钟,也就是五个小时的时间,可以看到总的运行时间变成了5倍,不过考虑到python语言的特性,这个运行时间也是完全可以接受的。虽然C++版本可以达到Python版本五分之一的用时,但是C++版本确实看起来不好理解,即使是我也只是能做到看懂C++代码,而难以流畅的编写C++代码。虽然python版本难以用于现实的应用,不过这个版本的实现还是可以有一定参考价值的,毕竟这也是我至今网上唯一可以找到的python版本实现的《2048》游戏的TDL算法,也正是因为网上没有才自己用python语言写了一遍。

 

 

=========================================

 

posted on 2022-08-22 22:35  Angry_Panda  阅读(487)  评论(0)    收藏  举报

导航