SSE 标准化向量

           mov    esi,   this            ; vector u
            movups xmm0,  [esi]           ; first vector in xmm0
            movaps xmm2,  xmm0            ; copy original vector
            mulps  xmm0,  xmm0            ; mul with 2nd vector
            movaps xmm1,  xmm0            ; copy result
            shufps xmm1,  xmm1, 4Eh       ; shuffle: f1,f0,f3,f2
            addps  xmm0,  xmm1            ; add: f3+f1,f2+f0,f1+f3,f0+f2 
            movaps xmm1,  xmm0            ; copy results
            shufps xmm1,  xmm1, 11h       ; shuffle: f0+f2,f1+f3,f0+f2,f1+f3 
            addps  xmm0,  xmm1            ; add: x,x,f0+f1+f2+f3,f0+f1+f2+f3

            rsqrtps xmm0,  xmm0           ; recip. sqrt (faster than ss + shufps)
            mulps   xmm2,  xmm0           ; mul by reciprocal
            movups  [esi], xmm2           ; bring back result

核心思想 xmm寄存器 4分量 全部赋值为x*x +y*y + z*z, rsqrtps 求向量长度的倒数,最终乘以xmm2存的原向量,完成标准化。

rsqrtps用的查表方式,近似求的开方倒数,提高执行速度。精确求利用rsqrtss指令。

posted @ 2014-06-07 14:58  xxx1  阅读(1280)  评论(0编辑  收藏  举报