随笔-perf-topdown

perf list metirc
perf list metricgroup | grep tma 

方式1:

perf stat --topdown --no-metric-only -C 0 taskset -c 0 /data/products/hpc001/app_inc2
perf stat --topdown --td-level 1 --no-metric-only -C 0 taskset -c 0 /data/products/hpc001/app_inc2
perf stat --topdown --td-level 2 --no-metric-only -C 0 taskset -c 0 /data/products/hpc001/app_inc2
perf stat --topdown --td-level 3 --no-metric-only -C 0 taskset -c 0 /data/products/hpc001/app_inc2
perf stat -C 0 -M TopdownL1  taskset -c 0 /data/products/hpc001/app_inc2
perf stat -C 0 -M TopdownL2  taskset -c 0 /data/products/hpc001/app_inc2

方式2:使用cpu-event相应的编码,pmu-tool就是这样使用perf进行topdown(待求证)

+++

perf stat -r 10 --topdown --no-metric-only  -C 0 taskset -c 0 /data/products/hpc001/app_inc2
 Performance counter stats for 'CPU(s) 0' (10 runs):

           534,646      CPU_CLK_UNHALTED.REF_XCLK        #      2.4 %  tma_frontend_bound     
                                                  #     71.4 %  tma_retiring           
                                                  #     25.0 %  tma_backend_bound      
                                                  #      1.2 %  tma_bad_speculation      ( +-  0.96% )  (54.90%)
         8,896,957      IDQ_UOPS_NOT_DELIVERED.CORE                                             ( +- 13.55% )  (59.16%)
           230,281      INT_MISC.RECOVERY_CYCLES_ANY                                            ( +-  7.92% )  (61.57%)
           499,734      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE                                        ( +-  1.48% )  (61.57%)
        94,660,854      CPU_CLK_UNHALTED.THREAD                                                 ( +-  0.82% )  (57.94%)
       261,524,858      UOPS_RETIRED.RETIRE_SLOTS                                               ( +-  0.35% )  (53.68%)
       265,470,310      UOPS_ISSUED.ANY                                                         ( +-  0.75% )  (51.25%)

          0.023888 +- 0.000313 seconds time elapsed  ( +-  1.31% )

tma_backend_bound展开:加_group

perf stat -r 10 -M tma_backend_bound_group --no-metric-only  -C 0 taskset -c 0 /data/products/hpc001/app_inc2
 Performance counter stats for 'CPU(s) 0' (10 runs):

           507,223      CPU_CLK_UNHALTED.REF_XCLK        #     30.3 %  tma_core_bound         
                                                  #      0.2 %  tma_memory_bound         ( +-  0.97% )  (30.41%)
            48,375      EXE_ACTIVITY.BOUND_ON_STORES                                            ( +- 14.02% )  (30.43%)
        11,606,736      IDQ_UOPS_NOT_DELIVERED.CORE                                             ( +-  5.62% )  (30.42%)
         4,339,363      EXE_ACTIVITY.1_PORTS_UTIL                                               ( +-  1.61% )  (32.16%)
           278,979      INT_MISC.RECOVERY_CYCLES_ANY                                            ( +- 12.87% )  (34.79%)
           520,047      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE                                        ( +-  1.01% )  (34.79%)
        97,267,311      CPU_CLK_UNHALTED.THREAD                                                 ( +-  1.06% )  (34.80%)
           139,173      CYCLE_ACTIVITY.STALLS_MEM_ANY                                           ( +-  9.81% )  (34.80%)
       266,737,007      UOPS_RETIRED.RETIRE_SLOTS                                               ( +-  0.94% )  (34.81%)
           532,583      CYCLE_ACTIVITY.STALLS_TOTAL                                             ( +-  9.07% )  (34.81%)
        30,984,874      EXE_ACTIVITY.2_PORTS_UTIL                                               ( +-  1.13% )  (34.81%)
       261,568,289      UOPS_ISSUED.ANY                                                         ( +-  0.75% )  (33.08%)

          0.024295 +- 0.000345 seconds time elapsed  ( +-  1.42% )

继续:

perf stat -r 10 -M tma_core_bound_group --no-metric-only  -C 0 taskset -c 0 /data/products/hpc001/app_inc2
 Performance counter stats for 'CPU(s) 0' (10 runs):

           511,761      CPU_CLK_UNHALTED.REF_XCLK        #      0.0 %  tma_divider            
                                                  #     27.1 %  tma_ports_utilization    ( +-  1.93% )  (17.56%)
           273,364      EXE_ACTIVITY.EXE_BOUND_0_PORTS                                          ( +- 15.28% )  (21.87%)
        10,672,153      IDQ_UOPS_NOT_DELIVERED.CORE                                             ( +- 19.64% )  (26.19%)
            65,904      EXE_ACTIVITY.BOUND_ON_STORES                                            ( +- 25.16% )  (30.51%)
         4,470,767      EXE_ACTIVITY.1_PORTS_UTIL                                               ( +-  2.92% )  (34.63%)
           236,840      INT_MISC.RECOVERY_CYCLES_ANY                                            ( +- 30.52% )  (34.65%)
           503,697      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE                                        ( +-  4.93% )  (34.65%)
        98,359,800      CPU_CLK_UNHALTED.THREAD                                                 ( +-  1.25% )  (34.65%)
       265,621,980      UOPS_RETIRED.RETIRE_SLOTS                                               ( +-  1.27% )  (34.65%)
           383,769      CYCLE_ACTIVITY.STALLS_MEM_ANY                                           ( +- 16.89% )  (34.66%)
        32,263,932      EXE_ACTIVITY.2_PORTS_UTIL                                               ( +-  3.21% )  (30.54%)
         1,823,415      CYCLE_ACTIVITY.STALLS_TOTAL                                             ( +- 16.95% )  (26.21%)
       261,464,041      UOPS_ISSUED.ANY                                                         ( +-  1.34% )  (21.88%)
             4,759      ARITH.DIVIDER_ACTIVE                                                    ( +-  9.02% )  (17.54%)

          0.024391 +- 0.000606 seconds time elapsed  ( +-  2.48% )
posted @ 2025-01-08 14:13  LiYanbin  阅读(53)  评论(0)    收藏  举报