人形机器人—强化学习中对输入数据正则化可以有效提高算法性能

相关:

人形机器人-强化学习算法-PPO算法的实现细节是否会对算法性能有大的影响.

AI大时代下,对于多媒体数据(图像等等类型数据),进行输入数据的正则化可以有效提高算法性能,但是对于人形机器人这种控制类的应用,输入数据往往是传感器的输入数据,对于这类型的输入数据是否进行正则化并没有一个系统的研究,本文这里只是列举一个例子来说明——人形机器人—强化学习中对输入数据正则化可以有效提高算法性能


例子所在的项目源码地址:

https://openi.pcl.ac.cn/devilmaycry812839668/google_brax_ppo_pytorch

具体的输入数据正则化部分代码:

  @torch.jit.export
  def update_normalization(self, observation):
    self.num_steps += observation.shape[0] * observation.shape[1]
    input_to_old_mean = observation - self.running_mean
    mean_diff = torch.sum(input_to_old_mean / self.num_steps, dim=(0, 1))
    self.running_mean = self.running_mean + mean_diff
    input_to_new_mean = observation - self.running_mean
    var_diff = torch.sum(input_to_new_mean * input_to_old_mean, dim=(0, 1))
    self.running_variance = self.running_variance + var_diff

  @torch.jit.export
  def normalize(self, observation):
    variance = self.running_variance / (self.num_steps + 1.0)
    variance = torch.clip(variance, 1e-6, 1e6)
    return ((observation - self.running_mean) / variance.sqrt()).clip(-5, 5)
    return observation
    
  @torch.jit.export
  def normalize(self, observation):
    # variance = self.running_variance / (self.num_steps + 1.0)
    # variance = torch.clip(variance, 1e-6, 1e6)
    # return ((observation - self.running_mean) / variance.sqrt()).clip(-5, 5)
    return observation

可以看到,上面代码中有注释掉部分的normalize为不进行正则化的代码,反之为使用正则化的代码。


性能对比:

使用正则化处理输入数据的性能表现见:

人形机器人-强化学习算法-PPO算法的实现细节是否会对算法性能有大的影响.

部分摘取:

-257.35278 626.76746 872.1776 1725.8082 2591.6821 3190.6335 3620.4314 4082.5015 4468.4927 4762.6313 4986.739
-189.53021 588.52185 522.61017 1434.2548 2458.558 3038.153 3335.4653 3691.8052 4317.5156 5113.4204 5509.2427
-176.273 560.4658 743.65686 1602.8649 2622.3098 2960.6648 3334.305 3743.382 4042.1658 4236.974 4528.8677
-197.11514 693.2258 1242.5767 2034.826 2612.6963 2984.8345 3413.858 3686.592 4052.4087 4459.6577 4780.1724
-207.15936 544.1607 698.48737 1465.3678 2158.609 2698.9854 2985.0945 3370.756 3502.2546 3557.984 3639.0762
-247.04488 609.49286 886.9049 1622.353 2578.5637 3369.3296 3897.0066 4506.5947 4957.028 5286.8623 5497.941
-279.18222 695.9527 781.5837 1851.9932 2501.7515 3050.4778 3484.7144 3728.6135 4007.2332 4429.4478 4524.212
-183.95197 539.9428 703.08484 1465.9211 2428.05 2860.103 3250.7612 3718.4924 4047.587 4484.833 4805.9463
-295.8933 576.68585 886.04565 1722.0039 2508.7786 2791.8215 3169.9558 3641.8894 4151.5737 4636.6797 5302.451
-36.912907 599.4662 668.3243 1792.368 2677.549 2945.6028 3446.7866 3810.89 4212.9053 4442.2866 4756.253


不使用正则化处理输入数据的性能表现如下:

-77.85561 438.34476 794.2656 792.8977 957.17664 1299.5475 1671.1133 1804.6962 2157.206 2700.303 3020.0437
-91.65347 445.0818 715.5132 658.87396 771.9714 1484.525 1777.1486 1866.5006 2155.879 2346.3665 2604.5776
-54.358955 472.08698 726.3028 682.5898 767.46655 1288.3954 1508.9557 1689.5298 1843.5364 2143.8035 2621.5916
-113.30742 429.77826 558.06006 546.07886 902.2285 1396.169 1925.1675 2173.3057 2555.192 2633.8794 2793.6975
-104.48749 440.79535 746.394 760.568 893.1498 1455.1602 1938.6824 2323.6711 2387.4153 2592.8958 2433.0327
-68.385895 486.05206 741.18823 617.3225 851.3968 1078.9066 1606.7587 1588.2295 1796.87 1858.1749 2129.9033
-131.3798 415.31247 697.7362 658.2336 900.6837 1351.4711 1996.3832 2236.146 2536.4805 3153.6646 3416.0862
-120.99256 413.58728 626.3506 687.95013 831.1084 1025.0422 1500.929 1753.8656 1871.0732 2011.8903 2641.913
-75.45705 434.28647 693.9142 634.13086 889.674 1296.5042 1882.7454 2164.5344 2467.4182 2745.7888 3035.4177
-114.68898 407.845 761.67566 713.68396 852.0756 1375.5042 2069.7917 2339.376 2588.798 2725.4526 2557.2778

-101.7523 414.86453 739.1547 620.4246 941.0865 1567.3666 1911.1805 2269.221 2500.7324 2582.7607 2771.145
-86.759964 450.10226 778.59875 631.0722 916.60474 1204.1487 1551.1722 2049.6711 2148.086 2565.2358 2978.179
-137.24196 419.82333 755.31555 633.0998 990.715 1377.3263 1823.6952 2302.8936 2653.6897 2942.6562 3467.5632
-58.727757 455.91443 684.72003 815.42944 904.37 1077.3318 1279.6232 1409.0516 1847.3365 1835.936 1919.0789
-82.06487 438.08466 774.07227 494.94788 744.4304 1202.0668 1658.3977 2439.672 2782.2876 2768.5366 2827.773
-67.98854 473.5706 809.3946 843.9213 890.9753 1349.2875 1436.7722 1739.1041 2343.863 2396.6265 2619.9243
-146.65645 377.01434 744.7471 638.74554 923.1421 1646.3494 1954.8031 2045.2708 2119.6357 2463.2346 2897.6926
-87.15275 428.5989 790.8215 702.055 933.6097 1429.7357 1797.0962 2513.0598 2753.76 3387.56 3623.7625
-98.770515 457.0335 650.9234 573.8964 902.82605 1250.546 1681.1133 2001.0364 2475.162 2694.0935 2646.2554
-123.9271 410.84375 704.1983 593.13965 785.35803 1312.5563 1832.3549 2063.9033 2190.7776 2377.4612 2563.7488

-80.494736 407.7025 754.82404 618.2482 738.49335 996.0117 1486.0201 1590.1384 1897.3383 2123.903 2689.531
-133.89116 411.3381 704.5826 626.9709 781.57117 1429.573 1710.6082 2357.2805 2472.3542 2871.745 2997.374
-98.128235 423.15836 809.9427 838.063 949.5994 1744.5876 1972.8042 2194.3274 2271.6975 2354.5994 2753.2034
-102.76378 449.34274 646.9761 684.3824 907.6406 1209.5072 1672.5498 1574.901 2324.8418 2509.083 2649.5742
-78.073265 429.3245 719.81805 776.617 786.8812 991.8602 1284.5616 1622.5997 1922.5325 2153.3987 2444.3247
-140.06128 423.53253 742.2427 660.3897 983.35315 1524.5262 2302.9036 2543.2527 2925.0925 3201.347 3690.422
-138.25327 431.2104 596.2153 788.4841 972.05066 1147.9867 1361.5687 1885.2491 1950.9135 2710.2532 3254.82
-108.223175 446.30673 821.2198 763.8977 1033.9232 1736.8658 2305.6277 2483.5605 2975.572 2916.3809 3298.6462
-99.79859 404.25638 716.65204 809.3846 1191.8788 1731.1399 2089.4958 2296.3547 2709.287 2705.8142 3267.0093
-149.51466 396.21622 701.3763 596.6839 710.11194 1027.3427 1547.9878 1770.2979 2143.0483 2367.6943 2674.2273

-87.8204 421.0942 663.3163 659.74 866.9735 1292.325 2082.6216 2311.047 2354.8962 2648.5793 2907.729
-112.66196 405.9511 638.07837 652.81757 811.2735 1013.2833 1227.4052 1570.299 1991.3562 2248.4443 2374.6284
-98.86592 424.6044 766.20087 625.7965 833.9125 1696.7382 2245.1821 2408.9285 2512.2449 2607.1687 2472.5034
-104.108734 428.84222 778.0977 532.83167 865.2107 1307.252 1705.8304 2165.4087 2651.0496 2826.1138 3220.217
-80.21033 468.32587 766.8617 696.1783 968.42145 1195.2903 1731.434 1575.0563 1954.1262 2172.0479 2421.1565
-89.06305 450.88275 817.31067 750.5467 774.7103 1287.5636 1749.2317 2456.3752 2668.226 2850.1719 2836.5847
-98.60692 461.25327 721.6416 718.8977 854.6812 1358.3721 1978.8242 2339.2546 2478.2566 2473.155 2661.2944
-159.49805 397.19608 844.1305 723.54553 978.086 1811.628 2256.0168 2360.2686 2626.4612 2818.376 2803.0671
-140.09673 428.50305 785.77454 603.5179 999.2825 1369.0245 1828.2902 2201.4844 2176.6523 2479.9292 2505.437
-127.00579 407.35452 768.13916 817.6076 977.2075 1316.0947 1613.8751 1896.7802 2112.27 2154.151 2565.1787



通过二者的性能对比,我们可以容易得到结论,那就是对于人性机器人这种机器人控制问题,使用正则化的方法来处理输入数据也是可以有效的提高算法性能的,这说明正则化输入数据对于传感器类的数据类型也是同样适用的。



个人github博客地址:
https://devilmaycry812839668.github.io/

posted on 2024-11-23 21:50  Angry_Panda  阅读(73)  评论(0)    收藏  举报

导航