Generally a good method to avoid this is to randomly shuffle the data prior to each epoch of training.

 

http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/

posted @ 2017-09-27 12:08  papering  阅读(158)  评论(0)    收藏  举报