上代码演示下Profile-Guided Optimization (PGO)
Shell脚本名叫step
#!/bin/bash if [[ $# -ne 1 ]]; then exit; fi run() { C="g++ $1 main.cpp"; echo $C; $C; a.out; } case $1 in '0') run '' ;; '1') run '-fprofile-generate=.' ;; '2') run '-fprofile-use=.' ;; # -march=native不是默认选项,需显式指定,其作用是根据当前编译机器 # 的CPU自动启用支持的指令集优化。 '3') run '-O3 -march=native' ;; esac
main.cpp
#include <chrono> // C++11时间库 #include <random> #include <iostream> using namespace std; auto now = chrono::high_resolution_clock::now; // 模拟热点分支(PGO优化重点) int process (int x) { if (x > 100) // 该条件90%概率成立(模拟热点分支) return x * 2; return x + 1; } int main () { random_device rd; auto seed = rd(); mt19937 gen(seed); //生成伯努利分布随机布尔值,90% true bernoulli_distribution dist(0.9); auto start = now(); int sum = 0; for (int i = 0; i < 1'000'000; ++i) sum += process(dist(gen) ? 150 : 50); // F U C K ! cout << chrono::duration<double>(now() - start).count() << "s\n"; exit(sum); }
运行 (Intel N100)
~/pgo$ step 0 g++ main.cpp 0.0532354s ~/pgo$ step 1 g++ -fprofile-generate=. main.cpp 0.060655s ~/pgo$ step 2 g++ -fprofile-use=. main.cpp 0.0532936s ~/pgo$ step 3 g++ -O3 -march=native main.cpp 0.00564738s
colinsblog说:I experimented with some existing C++ 14 applications I’ve written. One, a flat-file to Parquet format converter, improved by only about five percent over an executable built with blanket -O3 optimization levels. Another, the “DCP” I’ve discussed before, improved by around thirty percent faster compared with the same program built with -O3. These tests were done with GCC 5.4, not exactly the newest. I’ll attempt to do similar tests with GCC 7 and 9.
我用的是gcc version 12.2.0 (Debian 12.2.0-14)
AI说:PGO是一种编译器优化技术,通过分析程序实际运行数据(如函数调用频率、分支路径等)生成配置文件,指导编译器对热点代码进行精准优化。其核心流程包括:插桩编译→数据采集→二次优化编译。PGO能显著提升性能,例如减少分支预测错误、优化代码布局以提高缓存命中率,在Chrome等应用中实现10%-20%的性能提升。该技术适用于计算密集型场景,需配合代表性数据收集以发挥最大效果。
AI好啊,讲PGO的中文网页基本上没有代码,而AI能编演示程序,写Makefile,虽然把-fprofile-generate和use的参数当成了文件名,导致use时说找不到文件。
补丁:step 1前得先rm *.gcda.
i < 15'000'000 ~/pgo$ step 0 g++ main.cpp 0.797331s ~/pgo$ step 1 g++ -O3 -march=native -fprofile-generate=. main.cpp 0.273545s ~/pgo$ step 2 g++ -O3 -march=native -fprofile-use=. main.cpp 0.138443s ~/pgo$ step 3 g++ -O3 -march=native main.cpp 0.0843469s
echo $?看a.out的返回值,确实变了。
皮卡鱼的很壮观:
Step 1/4. Building instrumented executable ... g++ -Wall -Wcast-qual -fno-exceptions -std=c++17 -fprofile-generate=profdir -pedantic -Wextra -Wshadow -Wmissing-declarations -m64 -DUSE_PTHREADS -DNDEBUG -O3 -funroll-loops-DIS_64BIT -msse -msse3 -mpopcnt -DUSE_POPCNT -DUSE_AVX2 -mavx2 -mbmi -DUSE_SSE41 -msse4.1 -DUSE_SSSE3 -mssse3 -DUSE_SSE2 -msse2 -DUSE_PEXT -mbmi2 -DARCH=x86-64-bmi2 -flto -flto-partition=one -c -o thread.o thread.cpp

浙公网安备 33010602011771号