OpenMP学习笔记

OpenMP初探

程序实例——Hello World

void helloworld() {
	omp_set_num_threads(4);//更改线程数
	int nthreads, threadid;
#pragma omp parallel private(nthreads,threadid)
	{
		threadid = omp_get_thread_num();
		printf("Hello World from OMP thread %d\n", threadid);
		if (threadid == 0) {
			nthreads = omp_get_num_threads();
			printf("Numbers of threads: %d\n", nthreads);
		}
	}
}

运行结果:

Hello World from OMP thread 0
Numbers of threads: 4
Hello World from OMP thread 2
Hello World from OMP thread 1
Hello World from OMP thread 3

再次运行结果:

Hello World from OMP thread 0
Numbers of threads: 4
Hello World from OMP thread 3
Hello World from OMP thread 1
Hello World from OMP thread 2

结论:

  • 并行时主线程最先开始(因为它不需要唤醒)
  • 其他线程开始有先有后(或者性能有差异)

语句格式

  • #pragma omp制导指令前缀,固定用法
  • directive-name制导指令,例如Hello World中的parallel
  • [clause,...]可选子句,例如Hello World中的private(nthreads,threadid)

共享任务结构

  • 并行for循环
  • 并行sections
  • 串行执行

并行for循环

语句格式:#pragma omp for [clause,...] newline
作用:指定紧随它的循环语句由线程组并行执行

程序实例——for Example

void forexample() {
	omp_set_num_threads(4);
	int i;
#pragma omp parallel
	{
#pragma omp for
		for (i = 0; i < 6; i++)
		{
			printf("i = %d, threadid = %d\n", i, omp_get_thread_num());
			Sleep(10);
		}
	}
}

运行结果:

i = 0, threadid = 0
i = 4, threadid = 2
i = 2, threadid = 1
i = 5, threadid = 3
i = 1, threadid = 0
i = 3, threadid = 1

结论:

  • 任务会尽量均分到每个线程(线程数大于任务数除外),任务为for循环的子循环
  • 任务是按照线程id进行顺序分配的
  • 如果只执行一个并行for循环,制导语句可以简写成#pragma omp parallel for,语句下必须紧跟for循环

并行sections

语句格式:

#pragma omp sections [clause,...] newline
{
	[#pragma omp section newline]
		......
    [#pragma omp section newline]
    	......
}

作用:指定内部的代码由线程组中各线程执行,不同的section中的代码由不同线程执行

注意:sections默认最后等待全部线程执行完成,除非使用 nowait 子句

程序实例 sections Example

void sectionsex() {
	omp_set_num_threads(4);
#pragma omp parallel sections
	{
#pragma omp section
		{
			printf("section 1 by threadid %d\n", omp_get_thread_num());
			Sleep(10);
		}
		
#pragma omp section
		{
			printf("section 2 by threadid %d\n", omp_get_thread_num());
			Sleep(10);
		}
		
#pragma omp section
		{
			printf("section 3 by threadid %d\n", omp_get_thread_num());
			Sleep(10);
		}
		
#pragma omp section
		{
			printf("section 4 by threadid %d\n", omp_get_thread_num());
			Sleep(10);
		}
	}
}

运行结果:

section 1 by threadid 0
section 2 by threadid 1
section 4 by threadid 3
section 3 by threadid 2

当去掉Sleep(10)后,运行结果为:

section 1 by threadid 0
section 4 by threadid 0
section 3 by threadid 1
section 2 by threadid 2

结论:

  • sections内部每遇到一个section就调用一个线程(可能会出现id大的线程先被调用)
  • 当已经有线程完成分配的section时,优先对其分配section而不是调用新的线程(这种情况只存在于完成section内任务比调用线程还快的情况)

串行执行

语句格式:#pragma omp single [clause] newline

作用:指定内部代码仅由一个线程执行,其他线程等待single内的代码执行完成,除非使用nowait子句

程序实例 single Example

void singleexample() {
	omp_set_num_threads(4);
	printf("master thread start\n");

#pragma omp parallel
	{
#pragma omp single //nowait
		{
			printf("single block in threadid %d\n", omp_get_thread_num());
			printf("waiting\n");
			Sleep(5000);
		}
		printf("parallel block in threadid %d\n", omp_get_thread_num());
	}
	printf("master thread finished\n");
}

运行结果:

master thread start
single block in threadid 0
waiting
parallel block in threadid 0
parallel block in threadid 3
parallel block in threadid 2
parallel block in threadid 1
master thread finished

添加nowait子句后运行结果:

master thread start
single block in threadid 0
parallel block in threadid 1
parallel block in threadid 3
parallel block in threadid 2
waiting
parallel block in threadid 0
master thread finished

结论:

  • 未加nowait子句的single作用类似于异步,保证了执行下面的代码前完成某些操作,比如数据保存,网络请求等
  • 添加nowait子句的single代码段可以用来做一些与下面的代码无关的任务,充分利用并行优势

同步结构

  • master 主线程执行
  • critical 临界区
  • barrier 同步屏障
  • atomic 原子操作
  • ordered 定序区域

master制导语句

语句格式:#pragma omp master newline

作用:指定代码段由主线程执行

critical制导语句

语句格式:#pragma omp critical [name] newline
其中,相同name的代码块同时只能由一个线程执行

作用:指定代码块一次仅由一个线程执行,其他线程阻塞在代码块之前

程序实例 critical Example

void criticalex() {
	omp_set_num_threads(4);
	printf("master thread start\n");
	int g = 0;
#pragma omp parallel for
	for (int i = 0; i < 10000; i++)
	{
		Sleep(1);
#pragma omp critical
		g++;
	}
	printf("g = %d\n", g);
	printf("expected g = 10000\n");
	printf("master thread finished\n");
}

运行结果:

master thread start
g = 10000
expected g = 10000
master thread finished

去除critical制导语句后运行结果:

master thread start
g = 9597
expected g = 10000
master thread finished

结论:

  • critical制导语句保证了线程之间不会同时运行代码段

atomic制导语句

语句格式:#pragma omp atomic newline

作用:指定特定存储单元将被原子更新

程序实例 atomic Example

void atomicexample() {
	omp_set_num_threads(2);
	int count = 0;
#pragma omp parallel
	{
		for (int i = 0; i < 1000; i++)
		{
			Sleep(1);
#pragma omp atomic
			count++;
		}
	}
	printf("count = %d\n", count);
}

运行结果

count = 2000

去除atomic制导语句后运行结果:

count = 1919

与critical制导语句比较:

  • critical可指定一个代码块,也可将多个代码块设为同样的阻塞区
  • atomic只能指定一条原子操作
  • atomic较critical性能好(未经实际验证)

待续...

posted @ 2020-03-09 16:22  logtea  阅读(305)  评论(0)    收藏  举报