NVCC 的编译和链接

https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/

1. 编译:

objects = main.o particle.o v3.o

all: $(objects)
    nvcc -arch=sm_20 $(objects) -o app

%.o: %.cpp
    nvcc -x cu -arch=sm_20 -I. -dc $< -o $@

clean:
    rm -f *.o app

2 链接
nvcc arch=sm_20 dlink v3.o particle.o main.o o gpuCode.o

g++ gpuCode.o main.o particle.o v3.o lcudart o app




NVCC 的控制精度的一些编译选项

--use_fast_math (-use_fast_math)
Make use of fast math library. '--use_fast_math' implies '--ftz=true --prec-div=false
--prec-sqrt=false --fmad=true'.

--ftz {true|false} (-ftz)
This option controls single-precision denormals support. '--ftz=true' flushes
denormal values to zero and '--ftz=false' preserves denormal values. '--use_fast_math'
implies '--ftz=true'.
Default value: false.

--prec-div {true|false} (-prec-div)
This option controls single-precision floating-point division and reciprocals.
'--prec-div=true' enables the IEEE round-to-nearest mode and '--prec-div=false'
enables the fast approximation mode. '--use_fast_math' implies '--prec-div=false'.
Default value: true.

--prec-sqrt {true|false} (-prec-sqrt)
This option controls single-precision floating-point squre root. '--prec-sqrt=true'
enables the IEEE round-to-nearest mode and '--prec-sqrt=false' enables the
fast approximation mode. '--use_fast_math' implies '--prec-sqrt=false'.
Default value: true.

--fmad {true|false} (-fmad)
This option enables (disables) the contraction of floating-point multiplies
and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA,
or DFMA). '--use_fast_math' implies '--fmad=true'.
Default value: true.

 
posted @ 2020-02-13 21:30  洛笔达  阅读(2067)  评论(0编辑  收藏  举报