CTA: Cooperative Thread Array
即 CUDA BLOCK
https://github.com/NVIDIA/cuda-samples/blob/2e41896e1b2c7e2699b7b7f6689c107900c233bb/Samples/3_CUDA_Features/cudaTensorCoreGemm/cudaTensorCoreGemm.cu