CS231n 2016 通关 第三章-Softmax 作业

在完成SVM作业的基础上,Softmax的作业相对比较轻松。

完成本作业需要熟悉与掌握的知识:

cell 1 设置绘图默认参数

 1 mport random
 2 import numpy as np
 3 from cs231n.data_utils import load_CIFAR10
 4 import matplotlib.pyplot as plt
 5 %matplotlib inline
 6 plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
 7 plt.rcParams['image.interpolation'] = 'nearest'
 8 plt.rcParams['image.cmap'] = 'gray'
 9 
10 # for auto-reloading extenrnal modules
11 # see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
12 %load_ext autoreload
13 %autoreload 2

cell 2 读取数据,并显示各个数据的尺寸:

 1 def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):
 2   """
 3   Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
 4   it for the linear classifier. These are the same steps as we used for the
 5   SVM, but condensed to a single function.  
 6   """
 7   # Load the raw CIFAR-10 data
 8   cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
 9   X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
10   
11   # subsample the data
12   mask = range(num_training, num_training + num_validation)
13   X_val = X_train[mask]
14   y_val = y_train[mask]
15   mask = range(num_training)
16   X_train = X_train[mask]
17   y_train = y_train[mask]
18   mask = range(num_test)
19   X_test = X_test[mask]
20   y_test = y_test[mask]
21   mask = np.random.choice(num_training, num_dev, replace=False)
22   X_dev = X_train[mask]
23   y_dev = y_train[mask]
24   
25   # Preprocessing: reshape the image data into rows
26   X_train = np.reshape(X_train, (X_train.shape[0], -1))
27   X_val = np.reshape(X_val, (X_val.shape[0], -1))
28   X_test = np.reshape(X_test, (X_test.shape[0], -1))
29   X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
30   
31   # Normalize the data: subtract the mean image
32   mean_image = np.mean(X_train, axis = 0)
33   X_train -= mean_image
34   X_val -= mean_image
35   X_test -= mean_image
36   X_dev -= mean_image
37   
38   # add bias dimension and transform into columns
39   X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
40   X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
41   X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
42   X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
43   
44   return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev
45 
46 
47 # Invoke the above function to get our data.
48 X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()
49 print 'Train data shape: ', X_train.shape
50 print 'Train labels shape: ', y_train.shape
51 print 'Validation data shape: ', X_val.shape
52 print 'Validation labels shape: ', y_val.shape
53 print 'Test data shape: ', X_test.shape
54 print 'Test labels shape: ', y_test.shape
55 print 'dev data shape: ', X_dev.shape
56 print 'dev labels shape: ', y_dev.shape

数据维度结果:

cell 3 用for循环实现Softmax的loss function 与grad:

 1 # First implement the naive softmax loss function with nested loops.
 2 # Open the file cs231n/classifiers/softmax.py and implement the
 3 # softmax_loss_naive function.
 4 
 5 from cs231n.classifiers.softmax import softmax_loss_naive
 6 import time
 7 
 8 # Generate a random softmax weight matrix and use it to compute the loss.
 9 W = np.random.randn(3073, 10) * 0.0001
10 loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)
11 
12 # As a rough sanity check, our loss should be something close to -log(0.1).
13 print 'loss: %f' % loss
14 print 'sanity check: %f' % (-np.log(0.1))

对应的py文件的代码:

 1 def softmax_loss_naive(W, X, y, reg):
 2   """
 3   Softmax loss function, naive implementation (with loops)
 4 
 5   Inputs have dimension D, there are C classes, and we operate on minibatches
 6   of N examples.
 7 
 8   Inputs:
 9   - W: A numpy array of shape (D, C) containing weights.
10   - X: A numpy array of shape (N, D) containing a minibatch of data.
11   - y: A numpy array of shape (N,) containing training labels; y[i] = c means
12     that X[i] has label c, where 0 <= c < C.
13   - reg: (float) regularization strength
14 
15   Returns a tuple of:
16   - loss as single float
17   - gradient with respect to weights W; an array of same shape as W
18   """
19   # Initialize the loss and gradient to zero.
20   loss = 0.0
21   dW = np.zeros_like(W)
22 
23   #############################################################################
24   # TODO: Compute the softmax loss and its gradient using explicit loops.     #
25   # Store the loss in loss and the gradient in dW. If you are not careful     #
26   # here, it is easy to run into numeric instability. Don't forget the        #
27   # regularization!                                                           #
28   #############################################################################
29   num_calss = W.shape[1]
30   num_train = X.shape[0]
31   buf_e = np.zeros(num_calss)
32 #print buf_e.shape
33 
34   for i in xrange(num_train) :
35     for j in xrange(num_calss) :
36   #1*3073 * 3073*1 = 1 >>>10
37       buf_e[j] = np.dot(X[i,:],W[:,j])
38     buf_e -= np.max(buf_e)
39     buf_e = np.exp(buf_e)
40     buf_sum = np.sum(buf_e)    
41     buf = buf_e/ buf_sum
42     loss -= np.log(buf[y[i]] )
43     for j in xrange(num_calss):
44         dW[:,j] +=( buf[j] - (j ==y[i]) )*X[i,:].T 
45   #regularization with elementwise production
46   loss /= num_train
47   dW /= num_train
48 
49   loss += 0.5 * reg * np.sum(W * W)
50   dW +=reg*W
51    #gradient 
52    
53   #############################################################################
54   #                          END OF YOUR CODE                                 #
55   #############################################################################
56 
57   return loss, dW

计算得到的结果:

使用了课程上所讲的验证方式。

问题:

 

cell 4 使用数值计算法对解析法得到的grad进行检验:

 1 # Complete the implementation of softmax_loss_naive and implement a (naive)
 2 # version of the gradient that uses nested loops.
 3 loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)
 4 
 5 # As we did for the SVM, use numeric gradient checking as a debugging tool.
 6 # The numeric gradient should be close to the analytic gradient.
 7 from cs231n.gradient_check import grad_check_sparse
 8 f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 0.0)[0]
 9 grad_numerical = grad_check_sparse(f, W, grad, 10)
10 
11 # similar to SVM case, do another gradient check with regularization
12 loss, grad = softmax_loss_naive(W, X_dev, y_dev, 1e2)
13 f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 1e2)[0]
14 grad_numerical = grad_check_sparse(f, W, grad, 10)

计算结果:

cell 5 使用向量法来实现loss funvtion与grad,并与使用for循环法比较:

 1 # Now that we have a naive implementation of the softmax loss function and its gradient,
 2 # implement a vectorized version in softmax_loss_vectorized.
 3 # The two versions should compute the same results, but the vectorized version should be
 4 # much faster.
 5 tic = time.time()
 6 loss_naive, grad_naive = softmax_loss_naive(W, X_dev, y_dev, 0.00001)
 7 toc = time.time()
 8 print 'naive loss: %e computed in %fs' % (loss_naive, toc - tic)
 9 
10 from cs231n.classifiers.softmax import softmax_loss_vectorized
11 tic = time.time()
12 loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_dev, y_dev, 0.00001)
13 toc = time.time()
14 print 'vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)
15 
16 # As we did for the SVM, we use the Frobenius norm to compare the two versions
17 # of the gradient.
18 grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
19 print 'Loss difference: %f' % np.abs(loss_naive - loss_vectorized)
20 print 'Gradient difference: %f' % grad_difference

比较的结果:

向量法的具体代码实现:

 1 def softmax_loss_vectorized(W, X, y, reg):
 2   """
 3   Softmax loss function, vectorized version.
 4 
 5   Inputs and outputs are the same as softmax_loss_naive.
 6   """
 7   # Initialize the loss and gradient to zero.
 8   loss = 0.0
 9   dW = np.zeros_like(W)
10   num_calss = W.shape[1]
11   num_train = X.shape[0]
12   #############################################################################
13   # TODO: Compute the softmax loss and its gradient using no explicit loops.  #
14   # Store the loss in loss and the gradient in dW. If you are not careful     #
15   # here, it is easy to run into numeric instability. Don't forget the        #
16   # regularization!                                                           #
17   #############################################################################
18   #500*3073 3073*10 >>>500*10
19   buf_e  = np.dot(X,W)
20   # 10 * 500 - 1*500   T
21   buf_e = np.subtract( buf_e.T , np.max(buf_e , axis = 1) ).T
22   buf_e = np.exp(buf_e)
23   #10*500 - 1*500 T
24   buf_e = np.divide( buf_e.T , np.sum(buf_e , axis = 1) ).T
25   #get loss
26   #print buf.shape
27   loss = - np.sum(np.log ( buf_e[np.arange(num_train),y] ) ) 
28   #get grad 
29   buf_e[np.arange(num_train),y]  -= 1   
30   # 3073 * 500 * 500*10
31   loss /=num_train  + 0.5 * reg * np.sum(W * W)
32   dW = np.dot(X.T,buf_e)/num_train + reg*W
33     
34   #############################################################################
35   #                          END OF YOUR CODE                                 #
36   #############################################################################
37 
38   return loss, dW

cell 6 使用验证集与训练集做超参数选取:

 1 # Use the validation set to tune hyperparameters (regularization strength and
 2 # learning rate). You should experiment with different ranges for the learning
 3 # rates and regularization strengths; if you are careful you should be able to
 4 # get a classification accuracy of over 0.35 on the validation set.
 5 from cs231n.classifiers import Softmax
 6 results = {}
 7 best_val = -1
 8 best_softmax = None
 9 learning_rates = np.logspace(-10, 10, 10)# [1e-7, 2e-7,3e-7,4e-7,5e-7]
10 regularization_strengths = np.logspace(-3, 6, 10) #[1e4,5e4,1e5,5e5,1e6,5e6,1e7,5e7,1e8]
11 
12 ################################################################################
13 # TODO:                                                                        #
14 # Use the validation set to set the learning rate and regularization strength. #
15 # This should be identical to the validation that you did for the SVM; save    #
16 # the best trained softmax classifer in best_softmax.                          #
17 ################################################################################
18 iters = 1500 
19 for lr in learning_rates:
20     for rs in regularization_strengths:
21         softmax = Softmax()
22         softmax.train(X_train, y_train, learning_rate=lr, reg=rs, num_iters=iters)
23         y_train_pred = softmax.predict(X_train)
24         accu_train = np.mean(y_train == y_train_pred)
25         y_val_pred = softmax.predict(X_val)
26         accu_val = np.mean(y_val == y_val_pred)
27         
28         results[(lr, rs)] = (accu_train, accu_val)
29         
30         if best_val < accu_val:
31             best_val = accu_val
32             best_softmax = softmax
33 ################################################################################
34 #                              END OF YOUR CODE                                #
35 ################################################################################
36     
37 # Print out results.
38 for lr, reg in sorted(results):
39     train_accuracy, val_accuracy = results[(lr, reg)]
40     print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
41                 lr, reg, train_accuracy, val_accuracy)
42     
43 print 'best validation accuracy achieved during cross-validation: %f' % best_val

得到较好的结果:

cell 7 选取较好的超参数的模型,对测试集进行测试,计算准确率:

1 # evaluate on test set
2 # Evaluate the best softmax on test set
3 y_test_pred = best_softmax.predict(X_test)
4 test_accuracy = np.mean(y_test == y_test_pred)
5 print 'softmax on raw pixels final test set accuracy: %f' % (test_accuracy, )

结果:0.378

cell 8 可视化w值:

 1 # Visualize the learned weights for each class
 2 w = best_softmax.W[:-1,:] # strip out the bias
 3 w = w.reshape(32, 32, 3, 10)
 4 
 5 w_min, w_max = np.min(w), np.max(w)
 6 
 7 classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
 8 for i in xrange(10):
 9   plt.subplot(2, 5, i + 1)
10   
11   # Rescale the weights to be between 0 and 255
12   wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
13   plt.imshow(wimg.astype('uint8'))
14   plt.axis('off')
15   plt.title(classes[i])

结果:

注:在softmax 与svm的超参数选择时,使用了共同的类,以及类中不同的相应的方法。具体的文件内容与注释如下

附:通关CS231n企鹅群:578975100 validation:DL-CS231n 

posted @ 2016-05-26 10:00 AIengineer 阅读(...) 评论(...) 编辑 收藏