Warp divergence

Threads are executed in warps of 32, with all threads in the warp executing the same instruction at the same time What happens if different threads in a warp need to do different things?
if (x<0.0)
  z = x-2.0;
else
  z = sqrt(x);


This is called warp divergence – CUDA will generate correct code to handle this, but to understand the performance you need to understand what CUDA does with it, all threads execute both conditional branches, so execution cost is sum of both branches ⇒ potentially large loss of performance

cuda 的实现是将两条路径的代码都运行了,只是让那个不符合的路径返回一个奇怪的值。原因在于在同一时间,所有的thread必须执行相同的instruction(指令),这里的thread是所有的thread,不只是同一个block中的。所以即使你让thread分开执行if和else语句,那也是一部分thread执行if的语句,其他的thread要执行else的语句要等待他执行完,然后这些执行else的语句。这与所有的thread同时执行if和else的代码是一样的,因此两个都执行,这是CUDA的实现方法。部分thread不能工作,造成闲置而降低了效率,成为divergence。

 

如果对thread进行if statement判断,就会出现warp divergence。

 

 

 

 

 

posted @ 2013-11-08 15:37  qingsun_ny  阅读(521)  评论(0编辑  收藏  举报