Generally a good method to avoid this is to randomly shuffle the data prior to each epoch of training.
http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/
http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/