算子开发-Ascend C算子中级微认证
1. 目标
- 实现Ascend C算子Sinh,算子命名为SinhCustom,编写其kernel侧代码、host侧代码,并完成aclnn算子调用测试。
 - 相关算法:sinh(x) = (exp(x) - exp(-x)) / 2.0
 - 根据官方提供服务器环境、工程代码、验证通过流程,最终在网页提交 zip 工程包,提交后 5-10分钟获得结果。
 
2. 环境注意事项
- SinhCuntom/CMakePresets.json 修改点:
 
ASCEND_CANN_PACKAGE_PATH: "/home/ma-user/Ascend/ascend-toolkit/latest"
ASCEND_COMPUTE_UNIT: "ascend910b" (官方提供NPU为 Ascend910B4)
- SinhCuntom/op_host/sinh_custom.cpp 算子注册部分修改:
 
this->AICore().AddConfig("ascend910b");
3. 编程注意
- SinhCuntom/op_host/sinh_custom.cpp: TilingFunc 函数不要使用 context->SetTilingKey(1); 否则导致流同步错误:Synchronize stream failed. error code is 507035。
 
4. 主要源码
- CopyIn
 
    __aicore__ inline void CopyIn(int32_t progress)
    {
        //考生补充算子代码
        //全局内存搬入 本地内存
        //申请 xLocal
        LocalTensor<half> xLocal = inQueueX.AllocTensor<half>();
        DataCopy(xLocal,xGm[progress*this->tileLength],this->tileLength);
        //从本地内存入队列
        inQueueX.EnQue(xLocal);
    }
- Compute
 
    __aicore__ inline void Compute(int32_t progress)
    {
	//考生补充算子计算代码
        //队列取数据计算 xLocal
        LocalTensor<half> xLocal = inQueueX.DeQue<half>();
        //申请 yLocal
        LocalTensor<half> yLocal = outQueueY.AllocTensor<half>();
        Exp(xLocal, xLocal, this->tileLength);
        Reciprocal(yLocal,xLocal,this->tileLength);
        Sub(yLocal, xLocal, yLocal, this->tileLength);
        Muls(yLocal, yLocal, (half)0.5, this->tileLength);
        outQueueY.EnQue<half>(yLocal);
        //释放 xLocal
        inQueueX.FreeTensor(xLocal);
    }
- CopyOut
 
    __aicore__ inline void CopyOut(int32_t progress)
    {
        //考生补充算子代码
        LocalTensor<half> yLocal = outQueueY.DeQue<half>();
        DataCopy(yGm[progress* this->tileLength],yLocal,this->tileLength);
        //释放 yLocal
        outQueueY.FreeTensor(yLocal);
    }
5. 结果验证

                    
                
                
            
        
浙公网安备 33010602011771号