摘要:
最近注意到在一些caffe模型中,偏置项的学习率通常设置为普通层的两倍。具体原因可以参考(https://datascience.stackexchange.com/questions/23549/why is the learning rate for the bias usually twice 阅读全文
摘要:
numpy 的文档提到数组广播机制为: When operating on two arrays, NumPy compares their shapes element wise. It starts with the trailing dimensions, and works its way 阅读全文