并行计算 - 随笔分类 - waytofall

[Z]K.I.S.S.Random Genrator “保持简单”随机数发生器

摘要：（假）随机数发生器是Monte Carlo算法的基础，自然大家格外关注。近几年，日本某君发明的Mersenne Twister[Makoto Matsumoto]可谓其中翘楚。不过，它并非尽善尽美。一则它不适合用于数据加密，二则不能通过一些统计检验。三嘛就是源程序比较长，不是一眼就能看明白的。反过来，上世纪90年代后期就被Marsaglia[Marsaglia]发明出来的Keep-It-Simple-And-Stupid算法却挺有趣而简短，而且能打通DIEHARD的各种检验[DIEHARD]。下面是源代码：unsigned int x = 123456789, y = 362436000, . 阅读全文

posted @ 2013-03-17 15:08 waytofall 阅读(381) 评论(0) 推荐(0)

[z]两个C++并行计算库

摘要：STAPLhttps://parasol.tamu.edu/stapl/POOMAhttp://acts.nersc.gov/formertools/pooma/index.html 阅读全文

posted @ 2013-03-14 10:38 waytofall 阅读(431) 评论(0) 推荐(0)

[z]ArrayFire：一个基于CUDA的并行计算库

摘要：http://www.accelereyes.com/arrayfire/c/index.htm 阅读全文

posted @ 2013-03-14 09:45 waytofall 阅读(303) 评论(0) 推荐(0)

[z]GPU Gems 2 - Chapter 37. Octree Textures on the GPU

摘要：http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter37.html 阅读全文

posted @ 2013-03-13 20:23 waytofall 阅读(217) 评论(0) 推荐(0)

[z]GPU Gems 3 - Chapter 37. Efficient Random Number Generation and Application Using CUDA

摘要：http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html 阅读全文

posted @ 2013-03-13 14:30 waytofall 阅读(168) 评论(0) 推荐(0)

NViDIA Developer Zone上关于八叉树遍历的讨论

摘要：https://devtalk.nvidia.com/default/topic/409587/best-way-of-traversing-an-octree-in-cuda-/ 阅读全文

posted @ 2013-03-12 18:51 waytofall 阅读(251) 评论(0) 推荐(0)

[Z]一个关于排序网络的教程

摘要：http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/networks/indexen.htm 阅读全文

posted @ 2013-03-12 17:17 waytofall 阅读(176) 评论(0) 推荐(0)

[Z]CUDA存储器模型

摘要：CUDA存储器模型：GPU片内：register，shared memory；板载显存：local memory,constant memory, texture memory, texture memory,global memory;host 内存： host memory, pinned memory.register: 访问延迟极低；基本单元：register file （32bit/each）计算能力1.0/1.1版本硬件：8192/SM；计算能力1.2/1.3版本硬件： 16384/SM; 每个线程占有的register有限，编程时不要为其分配过多私有变量；local mem 阅读全文

posted @ 2013-02-25 16:09 waytofall 阅读(891) 评论(0) 推荐(0)

[z]CUDA硬件架构

摘要：虽然用的繁体字，但是颇有高屋建瓴的味道，对于初学者尤其是对硬件架构不熟悉的人，在看了看官方的Programming Guide后，有一些地方感觉醍醐灌顶。http://www2.kimicat.com/gpu%E7%9A%84%E7%A1%AC%E9%AB%94%E6%9E%B6%E6%A7%8B 阅读全文

posted @ 2013-02-21 23:40 waytofall 阅读(226) 评论(0) 推荐(0)

[z]OpenGPU上一篇关于CUDA coalesce内存访问的帖子

摘要：http://www.opengpu.org/forum.php?mod=viewthread&tid=2635 阅读全文

posted @ 2013-02-19 22:58 waytofall 阅读(305) 评论(0) 推荐(0)

[Z]CUDA中Bank conflict冲突

摘要：其实这两天一直不知道什么叫bank conflict冲突，这两天因为要看那个矩阵转置优化的问题，里面有讲到这些问题，但是没办法，为了要看懂那个bank conflict冲突，我不得不去找资料，说句实话我现在不是完全弄明白，但是应该说有点眉目了，现在我就把网上找的整理一下，放在这边，等哪天完全弄明白了我就在修改里面的错误。Tesla的每个SM拥有16KB共享存储器，用于同一个线程块内的线程间通信。为了使一个half-warp内的线程能够在一个内核周期中并行访问，共享存储器被组织成16个bank，每个bank拥有32bit的宽度，故每个bank可保存256个整形或单精度浮点数，或者说目前的ba.. 阅读全文

posted @ 2013-02-19 14:04 waytofall 阅读(3351) 评论(0) 推荐(0)

[Z]OpenCL Data Parallel Primitives Library

摘要：http://code.google.com/p/clpp/ 阅读全文

posted @ 2013-02-18 22:00 waytofall 阅读(153) 评论(0) 推荐(0)

[z]苹果用OpenCL实现的Parallel Prefix Sum

摘要：http://developer.apple.com/library/mac/#samplecode/OpenCL_Parallel_Prefix_Sum_Example/Introduction/Intro.html#//apple_ref/doc/uid/DTS40008183-Intro-DontLinkElementID_2 阅读全文

posted @ 2013-02-18 21:52 waytofall 阅读(283) 评论(0) 推荐(0)

[z]NViDIA用OpenCL实现的很多基础并行算法

摘要：http://developer.download.nvidia.com/compute/cuda/4_2/rel/sdk/website/OpenCL/html/samples.html 阅读全文

posted @ 2013-02-18 21:42 waytofall 阅读(331) 评论(0) 推荐(0)

[z]一个基于CUDA的基础并行算法库

摘要：http://code.google.com/p/cudpp/ 阅读全文

posted @ 2013-02-18 16:29 waytofall 阅读(189) 评论(0) 推荐(0)

Simple & Naive

Get your hands dirty !

随笔分类 - 并行计算