OpenMP学习笔记
OpenMP初探
程序实例——Hello World
void helloworld() {
omp_set_num_threads(4);//更改线程数
int nthreads, threadid;
#pragma omp parallel private(nthreads,threadid)
{
threadid = omp_get_thread_num();
printf("Hello World from OMP thread %d\n", threadid);
if (threadid == 0) {
nthreads = omp_get_num_threads();
printf("Numbers of threads: %d\n", nthreads);
}
}
}
运行结果:
Hello World from OMP thread 0
Numbers of threads: 4
Hello World from OMP thread 2
Hello World from OMP thread 1
Hello World from OMP thread 3
再次运行结果:
Hello World from OMP thread 0
Numbers of threads: 4
Hello World from OMP thread 3
Hello World from OMP thread 1
Hello World from OMP thread 2
结论:
- 并行时主线程最先开始(因为它不需要唤醒)
- 其他线程开始有先有后(或者性能有差异)
语句格式
#pragma omp制导指令前缀,固定用法directive-name制导指令,例如Hello World中的parallel[clause,...]可选子句,例如Hello World中的private(nthreads,threadid)
共享任务结构
- 并行for循环
- 并行sections
- 串行执行
并行for循环
语句格式:#pragma omp for [clause,...] newline
作用:指定紧随它的循环语句由线程组并行执行
程序实例——for Example
void forexample() {
omp_set_num_threads(4);
int i;
#pragma omp parallel
{
#pragma omp for
for (i = 0; i < 6; i++)
{
printf("i = %d, threadid = %d\n", i, omp_get_thread_num());
Sleep(10);
}
}
}
运行结果:
i = 0, threadid = 0
i = 4, threadid = 2
i = 2, threadid = 1
i = 5, threadid = 3
i = 1, threadid = 0
i = 3, threadid = 1
结论:
- 任务会尽量均分到每个线程(线程数大于任务数除外),任务为for循环的子循环
- 任务是按照线程id进行顺序分配的
- 如果只执行一个并行for循环,制导语句可以简写成
#pragma omp parallel for,语句下必须紧跟for循环
并行sections
语句格式:
#pragma omp sections [clause,...] newline
{
[#pragma omp section newline]
......
[#pragma omp section newline]
......
}
作用:指定内部的代码由线程组中各线程执行,不同的section中的代码由不同线程执行
注意:sections默认最后等待全部线程执行完成,除非使用
nowait子句
程序实例 sections Example
void sectionsex() {
omp_set_num_threads(4);
#pragma omp parallel sections
{
#pragma omp section
{
printf("section 1 by threadid %d\n", omp_get_thread_num());
Sleep(10);
}
#pragma omp section
{
printf("section 2 by threadid %d\n", omp_get_thread_num());
Sleep(10);
}
#pragma omp section
{
printf("section 3 by threadid %d\n", omp_get_thread_num());
Sleep(10);
}
#pragma omp section
{
printf("section 4 by threadid %d\n", omp_get_thread_num());
Sleep(10);
}
}
}
运行结果:
section 1 by threadid 0
section 2 by threadid 1
section 4 by threadid 3
section 3 by threadid 2
当去掉Sleep(10)后,运行结果为:
section 1 by threadid 0
section 4 by threadid 0
section 3 by threadid 1
section 2 by threadid 2
结论:
- sections内部每遇到一个section就调用一个线程(可能会出现id大的线程先被调用)
- 当已经有线程完成分配的section时,优先对其分配section而不是调用新的线程(这种情况只存在于完成section内任务比调用线程还快的情况)
串行执行
语句格式:#pragma omp single [clause] newline
作用:指定内部代码仅由一个线程执行,其他线程等待single内的代码执行完成,除非使用nowait子句
程序实例 single Example
void singleexample() {
omp_set_num_threads(4);
printf("master thread start\n");
#pragma omp parallel
{
#pragma omp single //nowait
{
printf("single block in threadid %d\n", omp_get_thread_num());
printf("waiting\n");
Sleep(5000);
}
printf("parallel block in threadid %d\n", omp_get_thread_num());
}
printf("master thread finished\n");
}
运行结果:
master thread start
single block in threadid 0
waiting
parallel block in threadid 0
parallel block in threadid 3
parallel block in threadid 2
parallel block in threadid 1
master thread finished
添加nowait子句后运行结果:
master thread start
single block in threadid 0
parallel block in threadid 1
parallel block in threadid 3
parallel block in threadid 2
waiting
parallel block in threadid 0
master thread finished
结论:
- 未加
nowait子句的single作用类似于异步,保证了执行下面的代码前完成某些操作,比如数据保存,网络请求等 - 添加
nowait子句的single代码段可以用来做一些与下面的代码无关的任务,充分利用并行优势
同步结构
- master 主线程执行
- critical 临界区
- barrier 同步屏障
- atomic 原子操作
- ordered 定序区域
master制导语句
语句格式:#pragma omp master newline
作用:指定代码段由主线程执行
critical制导语句
语句格式:#pragma omp critical [name] newline
其中,相同name的代码块同时只能由一个线程执行
作用:指定代码块一次仅由一个线程执行,其他线程阻塞在代码块之前
程序实例 critical Example
void criticalex() {
omp_set_num_threads(4);
printf("master thread start\n");
int g = 0;
#pragma omp parallel for
for (int i = 0; i < 10000; i++)
{
Sleep(1);
#pragma omp critical
g++;
}
printf("g = %d\n", g);
printf("expected g = 10000\n");
printf("master thread finished\n");
}
运行结果:
master thread start
g = 10000
expected g = 10000
master thread finished
去除critical制导语句后运行结果:
master thread start
g = 9597
expected g = 10000
master thread finished
结论:
- critical制导语句保证了线程之间不会同时运行代码段
atomic制导语句
语句格式:#pragma omp atomic newline
作用:指定特定存储单元将被原子更新
程序实例 atomic Example
void atomicexample() {
omp_set_num_threads(2);
int count = 0;
#pragma omp parallel
{
for (int i = 0; i < 1000; i++)
{
Sleep(1);
#pragma omp atomic
count++;
}
}
printf("count = %d\n", count);
}
运行结果
count = 2000
去除atomic制导语句后运行结果:
count = 1919
与critical制导语句比较:
- critical可指定一个代码块,也可将多个代码块设为同样的阻塞区
- atomic只能指定一条原子操作
- atomic较critical性能好(未经实际验证)

浙公网安备 33010602011771号