OpenMP学习笔记

OpenMP初探

程序实例——Hello World

void helloworld() {
	omp_set_num_threads(4);//更改线程数
	int nthreads, threadid;
#pragma omp parallel private(nthreads,threadid)
	{
		threadid = omp_get_thread_num();
		printf("Hello World from OMP thread %d\n", threadid);
		if (threadid == 0) {
			nthreads = omp_get_num_threads();
			printf("Numbers of threads: %d\n", nthreads);
		}
	}
}

运行结果：

Hello World from OMP thread 0
Numbers of threads: 4
Hello World from OMP thread 2
Hello World from OMP thread 1
Hello World from OMP thread 3

再次运行结果：

Hello World from OMP thread 0
Numbers of threads: 4
Hello World from OMP thread 3
Hello World from OMP thread 1
Hello World from OMP thread 2

结论：

并行时主线程最先开始（因为它不需要唤醒）
其他线程开始有先有后（或者性能有差异）

语句格式

#pragma omp制导指令前缀，固定用法
directive-name制导指令，例如Hello World中的parallel
[clause,...]可选子句，例如Hello World中的private(nthreads,threadid)

共享任务结构

并行for循环
并行sections
串行执行

并行for循环

语句格式：#pragma omp for [clause,...] newline
作用：指定紧随它的循环语句由线程组并行执行

程序实例——for Example

void forexample() {
	omp_set_num_threads(4);
	int i;
#pragma omp parallel
	{
#pragma omp for
		for (i = 0; i < 6; i++)
		{
			printf("i = %d, threadid = %d\n", i, omp_get_thread_num());
			Sleep(10);
		}
	}
}

运行结果：

i = 0, threadid = 0
i = 4, threadid = 2
i = 2, threadid = 1
i = 5, threadid = 3
i = 1, threadid = 0
i = 3, threadid = 1

结论：

任务会尽量均分到每个线程（线程数大于任务数除外），任务为for循环的子循环
任务是按照线程id进行顺序分配的
如果只执行一个并行for循环，制导语句可以简写成#pragma omp parallel for，语句下必须紧跟for循环

并行sections

语句格式：

#pragma omp sections [clause,...] newline
{
	[#pragma omp section newline]
		......
    [#pragma omp section newline]
    	......
}

作用：指定内部的代码由线程组中各线程执行，不同的section中的代码由不同线程执行

注意：sections默认最后等待全部线程执行完成，除非使用 nowait 子句

程序实例 sections Example

void sectionsex() {
	omp_set_num_threads(4);
#pragma omp parallel sections
	{
#pragma omp section
		{
			printf("section 1 by threadid %d\n", omp_get_thread_num());
			Sleep(10);
		}
		
#pragma omp section
		{
			printf("section 2 by threadid %d\n", omp_get_thread_num());
			Sleep(10);
		}
		
#pragma omp section
		{
			printf("section 3 by threadid %d\n", omp_get_thread_num());
			Sleep(10);
		}
		
#pragma omp section
		{
			printf("section 4 by threadid %d\n", omp_get_thread_num());
			Sleep(10);
		}
	}
}

运行结果：

section 1 by threadid 0
section 2 by threadid 1
section 4 by threadid 3
section 3 by threadid 2

当去掉Sleep(10)后，运行结果为：

section 1 by threadid 0
section 4 by threadid 0
section 3 by threadid 1
section 2 by threadid 2

结论：

sections内部每遇到一个section就调用一个线程（可能会出现id大的线程先被调用）
当已经有线程完成分配的section时，优先对其分配section而不是调用新的线程（这种情况只存在于完成section内任务比调用线程还快的情况）

串行执行

语句格式：#pragma omp single [clause] newline

作用：指定内部代码仅由一个线程执行，其他线程等待single内的代码执行完成，除非使用nowait子句

程序实例 single Example

void singleexample() {
	omp_set_num_threads(4);
	printf("master thread start\n");

#pragma omp parallel
	{
#pragma omp single //nowait
		{
			printf("single block in threadid %d\n", omp_get_thread_num());
			printf("waiting\n");
			Sleep(5000);
		}
		printf("parallel block in threadid %d\n", omp_get_thread_num());
	}
	printf("master thread finished\n");
}

运行结果：

master thread start
single block in threadid 0
waiting
parallel block in threadid 0
parallel block in threadid 3
parallel block in threadid 2
parallel block in threadid 1
master thread finished

添加nowait子句后运行结果：

master thread start
single block in threadid 0
parallel block in threadid 1
parallel block in threadid 3
parallel block in threadid 2
waiting
parallel block in threadid 0
master thread finished

结论：

未加nowait子句的single作用类似于异步，保证了执行下面的代码前完成某些操作，比如数据保存，网络请求等
添加nowait子句的single代码段可以用来做一些与下面的代码无关的任务，充分利用并行优势

同步结构

master 主线程执行
critical 临界区
barrier 同步屏障
atomic 原子操作
ordered 定序区域

master制导语句

语句格式：#pragma omp master newline

作用：指定代码段由主线程执行

critical制导语句

语句格式：#pragma omp critical [name] newline
其中，相同name的代码块同时只能由一个线程执行

作用：指定代码块一次仅由一个线程执行，其他线程阻塞在代码块之前

程序实例 critical Example

void criticalex() {
	omp_set_num_threads(4);
	printf("master thread start\n");
	int g = 0;
#pragma omp parallel for
	for (int i = 0; i < 10000; i++)
	{
		Sleep(1);
#pragma omp critical
		g++;
	}
	printf("g = %d\n", g);
	printf("expected g = 10000\n");
	printf("master thread finished\n");
}

运行结果：

master thread start
g = 10000
expected g = 10000
master thread finished

去除critical制导语句后运行结果：

master thread start
g = 9597
expected g = 10000
master thread finished

结论：

critical制导语句保证了线程之间不会同时运行代码段

atomic制导语句

语句格式：#pragma omp atomic newline

作用：指定特定存储单元将被原子更新

程序实例 atomic Example

void atomicexample() {
	omp_set_num_threads(2);
	int count = 0;
#pragma omp parallel
	{
		for (int i = 0; i < 1000; i++)
		{
			Sleep(1);
#pragma omp atomic
			count++;
		}
	}
	printf("count = %d\n", count);
}

运行结果

count = 2000

去除atomic制导语句后运行结果：

count = 1919

与critical制导语句比较：

critical可指定一个代码块，也可将多个代码块设为同样的阻塞区
atomic只能指定一条原子操作
atomic较critical性能好（未经实际验证）

待续...

posted @ 2020-03-09 16:22 logtea 阅读(308) 评论(0) 收藏举报

刷新页面返回顶部

明日复明日

OpenMP学习笔记

OpenMP初探

程序实例——Hello World

语句格式

共享任务结构

并行for循环

程序实例——for Example

并行sections

程序实例 sections Example

串行执行

程序实例 single Example

同步结构

master制导语句

critical制导语句

程序实例 critical Example

atomic制导语句

程序实例 atomic Example

待续...

公告