循环序列模型-week2编程题2(使用词向量构建Emoji表情生成器)

构建一个模型,输入的是文字(比如“Let’s go see the baseball game tonight!”),输出的是表情(⚾️)。如果使用词向量,会发现即使训练集只明确地将几个单词与特定的表情符号相关联,模型也能够将测试集中的单词归纳、总结到同一个表情符号,甚至有些单词没有出现在你的训练集中也可以。

1.数据集预处理

数据集(X,Y)

 1 import numpy as np
 2 from emo_utils import *
 3 import emoji
 4 import matplotlib.pyplot as plt
 5 
 6 X_train, Y_train = read_csv('datasets/train_emoji.csv')    #(132,)
 7 X_test, Y_test = read_csv('datasets/test.csv')             #(56,)
 8 
 9 #最长句子的单词数
10 maxLen = len(max(X_train, key=len).split())
11 
12 #查看训练集中的内容
13 for index in range(5):
14     print(X_train[index], Y_train[index], label_to_emoji(Y_train[index]))

never talk to me again 3 😞
I am proud of your achievements 2 😄
It is the worst day in my life 3 😞
Miss you so much 0 ❤️
food is life 4 🍴

2.实现Emojifier-V1

基准模型如下:

模型的输入是一段文字(比如“l love you”),输出的是维度为(1,5)的向量,即5个可能表情的概率值,最后在argmax层找寻最大可能性的输出。

将标签Y转换成softmax分类器所需要的格式,即从(m,1)转换为独热编码(m,5),每一行都是经过编码后的样本,其中Y_oh指的是“Y-one-hot”。

1 #标签Y单个数值转换成独热编码
2 Y_oh_train = convert_to_one_hot(Y_train, C = 5)  #(132,5)
3 Y_oh_test = convert_to_one_hot(Y_test, C = 5)    #(56,5)
4 
5 for index in range(5):
6     print(Y_train[index], "is converted into one hot", Y_oh_train[index])

3 is converted into one hot [ 0. 0. 0. 1. 0.]
2 is converted into one hot [ 0. 0. 1. 0. 0.]
3 is converted into one hot [ 0. 0. 0. 1. 0.]
0 is converted into one hot [ 1. 0. 0. 0. 0.]
4 is converted into one hot [ 0. 0. 0. 0. 1.]

2.1把输入的句子转换为词向量,然后获取均值

1 #3个字典,{word:index},{index:word},{word:该word的嵌入向量}
2 word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('datasets/glove.6B.50d.txt')
3 
4 #测试数据集是否加载正确
5 word = "cucumber"
6 index = 113317
7 print("the index of", word, "in the vocabulary is", word_to_index[word])                  #113317
8 print("the", str(index) + "th word in the vocabulary is", index_to_word[index])           #‘cucumber’

实现sentence_to_avg()函数,分为两个步骤:

  1. 把每个句子转换为小写,然后分割为列表。我们可以使用X.lower()与X.split()。
  2. 对于句子中的每一个单词,转换为GloVe向量,然后对它们取平均。
 1 def sentence_to_avg(sentence, word_to_vec_map):
 2     """
 3     Converts a sentence (string) into a list of words (strings). Extracts the GloVe representation of each word
 4     and averages its value into a single vector encoding the meaning of the sentence.
 5     
 6     Arguments:
 7     sentence -- string, one training example from X
 8     word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
 9     
10     Returns:
11     avg -- average vector encoding information about the sentence, numpy-array of shape (50,)
12     """   
13     ### START CODE HERE ###
14     # Step 1: Split sentence into list of lower case words (≈ 1 line)
15     words = sentence.lower().split()
16     # Initialize the average word vector, should have the same shape as your word vectors.
17     avg = np.zeros(50,)
18     # Step 2: average the word vectors. You can loop over the words in the list "words".
19     for w in words:
20         avg += word_to_vec_map[w]
21     avg /= len(words) 
22     ### END CODE HERE ###
23     
24     return avg
25 
26 avg = sentence_to_avg("Morrocan couscous is my favorite dish", word_to_vec_map)
27 print("avg = ", avg)

执行结果:

1 avg =  [-0.008005    0.56370833 -0.50427333  0.258865    0.55131103  0.03104983
2         -0.21013718  0.16893933 -0.09590267  0.141784   -0.15708967  0.18525867
3         0.6495785   0.38371117  0.21102167  0.11301667  0.02613967  0.26037767
4         0.05820667 -0.01578167 -0.12078833 -0.02471267  0.4128455   0.5152061
5         0.38756167 -0.898661   -0.535145    0.33501167  0.68806933 -0.2156265
6         1.797155    0.10476933 -0.36775333  0.750785    0.10282583  0.348925
7         -0.27262833  0.66768    -0.10706167 -0.283635    0.59580117  0.28747333
8         -0.3366635   0.23393817  0.34349183  0.178405    0.1166155  -0.076433
9         0.1445417   0.09808667]

2.2实现model()函数,获取均值后进行前向传播,计算损失,再进行反向传播,最后更新参数。

计算公式:

 1 def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):
 2     """
 3     Model to train word vector representations in numpy.
 4     
 5     Arguments:
 6     X -- input data, numpy array of sentences as strings, of shape (m, 1)
 7     Y -- labels, numpy array of integers between 0 and 7, numpy-array of shape (m, 1)
 8     word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
 9     learning_rate -- learning_rate for the stochastic gradient descent algorithm
10     num_iterations -- number of iterations
11     
12     Returns:
13     pred -- vector of predictions, numpy-array of shape (m, 1)
14     W -- weight matrix of the softmax layer, of shape (n_y, n_h)
15     b -- bias of the softmax layer, of shape (n_y,)
16     """
17     np.random.seed(1)
18 
19     # Define number of training examples
20     m = Y.shape[0]                          # number of training examples
21     n_y = 5                                 # number of classes  
22     n_h = 50                                # dimensions of the GloVe vectors 
23     
24     # Initialize parameters using Xavier initialization
25     W = np.random.randn(n_y, n_h) / np.sqrt(n_h)
26     b = np.zeros((n_y,))
27     
28     # Convert Y to Y_onehot with n_y classes
29     Y_oh = convert_to_one_hot(Y, C = n_y) 
30     
31     # Optimization loop
32     for t in range(num_iterations):                       # Loop over the number of iterations
33         for i in range(m):                                # Loop over the training examples
34             
35             ### START CODE HERE ### (≈ 4 lines of code)
36             # Average the word vectors of the words from the i'th training example
37             avg = sentence_to_avg(X[i], word_to_vec_map)
38 
39             # Forward propagate the avg through the softmax layer
40             z = np.dot(W, avg) + b
41             a = softmax(z)
42             # Compute cost using the i'th training label's one hot representation and "A" (the output of the softmax)
43             cost = -np.sum(Y_oh[i] * np.log(a))
44             ### END CODE HERE ###
45             
46             # Compute gradients 
47             dz = a - Y_oh[i]
48             dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
49             db = dz
50 
51             # Update parameters with Stochastic Gradient Descent
52             W = W - learning_rate * dW
53             b = b - learning_rate * db
54         
55         if t % 100 == 0:
56             print("Epoch: " + str(t) + " --- cost = " + str(cost))
57             pred = predict(X, Y, W, b, word_to_vec_map)
58 
59     return pred, W, b

测试训练结果:

1 print(X_train.shape)                                 #(132,)
2 print(Y_train.shape)                                 #(132,)
3 print(np.eye(5)[Y_train.reshape(-1)].shape)          #(132, 5)
4 print(X_train[0])                                    #never talk to me again
5 print(type(X_train))                                 #<class 'numpy.ndarray'>
6 
7 pred, W, b = model(X_train, Y_train, word_to_vec_map)   

Epoch: 0 --- cost = 1.95204988128
Accuracy: 0.348484848485
Epoch: 100 --- cost = 0.0797181872601
Accuracy: 0.931818181818
Epoch: 200 --- cost = 0.0445636924368
Accuracy: 0.954545454545
Epoch: 300 --- cost = 0.0343226737879
Accuracy: 0.969696969697

2.3验证测试集

原测试集:

1 print("Training set:")
2 pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
3 print('Test set:')
4 pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)

Training set:
Accuracy: 0.977272727273
Test set:
Accuracy: 0.857142857143

手动测试:

1 X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "not feeling happy"])
2 Y_my_labels = np.array([[0], [0], [2], [1], [4],[3]])
3 
4 pred = predict(X_my_sentences, Y_my_labels, W, b, word_to_vec_map)
5 print_predictions(X_my_sentences, pred)

Accuracy: 0.833333333333

i adore you ❤️
i love you ❤️
funny lol 😄
lets play with a ball ⚾
food is ready 🍴
not feeling happy 😄

由于词嵌入的原因,“adore”与“love”很相似,即使训练集中没有“adore”这个词汇,它仍然可以正确表达出“❤️”,但是在“you are not happy”中却表达了“❤️”,是因为这个算法使用均值,忽略了排序,所以不善于理解“not happy”这一类词汇。

对原测试集打印矩阵帮助理解哪些类让模型学习起来比较困难,横轴为预测,竖轴为实际标签:

1 print(Y_test.shape)                                                    #(56,)
2 print('           '+ label_to_emoji(0)+ '    ' + label_to_emoji(1) + '    ' +  label_to_emoji(2)+ '    ' + label_to_emoji(3)+'   ' + label_to_emoji(4))
3 print(pd.crosstab(Y_test, pred_test.reshape(56,), rownames=['Actual'], colnames=['Predicted'], margins=True))
4 plot_confusion_matrix(Y_test, pred_test)

执行结果:

1             ❤️   ⚾  😄   😞   🍴
2 Predicted  0.0  1.0  2.0  3.0  4.0  All
3 Actual                                 
4 0            6    0    0    1    0    7
5 1            0    8    0    0    0    8
6 2            2    0   16    0    0   18
7 3            1    1    2   12    0   16
8 4            0    0    1    0    6    7
9 All          9    9   19   13    6   56

3.在Keras中使用LSTM模块实现Emojifier-V2

这个模型可以解决上面的“not happy”这一类词汇的预测。

加载库

1 import numpy as np
2 np.random.seed(0)
3 from keras.models import Model
4 from keras.layers import Dense, Input, Dropout, LSTM, Activation
5 from keras.layers.embeddings import Embedding
6 from keras.preprocessing import sequence
7 from keras.initializers import glorot_uniform
8 np.random.seed(1)

实现的模型如下(一个两层的LSTM的序列分类器many-to-one):

对于具有不同单词数目的句子,转化为向量后所需要的LSTM cell个数也不同,不可能对这些句子进行同时训练。

通用的解决方案是使用填充。指定最长句子的长度,然后对其他句子进行填充到相同长度。比如:指定最大的句子的长度为20,我们可以对每个句子使用“0”来填充,直到句子长度为20,任何一个超过20个单词的句子将被截取,所以一个比较简单的方式就是找到最长句子,获取它的长度,然后指定它的长度为最长句子的长度。

3.1嵌入层(Embedding)

本部分将学习如何在Keras中创建 Embedding层,将用 GloVe 50-dimensional vectors 初始化 Embedding layer。因为训练集很小,所以我们不会更新词嵌入,而是会保留词嵌入的值。

Embedding层输输入一个整数矩阵,维度为(batch的大小,最大的输入长度),输出的数组的维度为(batch的大小, 最大的输入长度, 词向量的维度)。

第一步就是把所有的要训练的句子转换成索引列表,然后对这些列表使用0填充,直到列表长度为最长句子的长度。

 1 def sentences_to_indices(X, word_to_index, max_len):
 2     """
 3     Converts an array of sentences (strings) into an array of indices corresponding to words in the sentences.
 4     The output shape should be such that it can be given to `Embedding()` (described in Figure 4). 
 5     
 6     Arguments:
 7     X -- array of sentences (strings), of shape (m, 1)
 8     word_to_index -- a dictionary containing the each word mapped to its index
 9     max_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. 
10     
11     Returns:
12     X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)
13     """
14     m = X.shape[0]                                   # number of training examples
15     
16     ### START CODE HERE ###
17     # Initialize X_indices as a numpy matrix of zeros and the correct shape (≈ 1 line)
18     X_indices = np.zeros((m, max_len))
19     
20     for i in range(m):                               # loop over training examples    
21         # Convert the ith training sentence in lower case and split is into words. You should get a list of words.
22         words = X[i].lower().split()    
23         # Initialize j to 0
24         j = 0
25         
26         # Loop over the words of sentence_words
27         for w in words:
28             # Set the (i,j)th entry of X_indices to the index of the correct word.
29             X_indices[i][j] = word_to_index[w]
30             # Increment j to j + 1
31             j += 1        
32     ### END CODE HERE ###
33     
34     return X_indices
35 
36 
37 X1 = np.array(["funny lol", "lets play baseball", "food is ready for you"])
38 X1_indices = sentences_to_indices(X1,word_to_index, max_len = 5)
39 print("X1 =", X1)
40 print("X1_indices =", X1_indices)

X1 = ['funny lol' 'lets play baseball' 'food is ready for you']
X1_indices = [[ 155345. 225122. 0. 0. 0.]
                       [ 220930. 286375. 69714. 0. 0.]
                       [ 151204. 192973. 302254. 151349. 394475.]]

下面构建Embedding()层,使用已经训练好了的词向量,在构建之后,使用sentence_to_indices()生成的数据作为输入,Embedding()层将返回每个句子的词嵌入。

实现pretrained_embedding_layer()函数,分为以下几个步骤:

  1. 使用0来初始化嵌入矩阵。
  2. 使用word_to_vec_map来将词嵌入向量填充进嵌入矩阵的每一行。
  3. 在Keras中定义嵌入层,当调用Embedding()的时候需要让这一层的参数不能被训练,所以我们可以设置trainable=False。
  4. 将词嵌入的权值设置为词嵌入的值
 1 def pretrained_embedding_layer(word_to_vec_map, word_to_index):
 2     """
 3     Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
 4     
 5     Arguments:
 6     word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
 7     word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
 8 
 9     Returns:
10     embedding_layer -- pretrained layer Keras instance
11     """ 
12     vocab_len = len(word_to_index) + 1                  # adding 1 to fit Keras embedding (requirement)
13     emb_dim = word_to_vec_map["cucumber"].shape[0]      # define dimensionality of your GloVe word vectors (= 50)
14     
15     ### START CODE HERE ###
16     # Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim)
17     emb_matrix = np.zeros((vocab_len, emb_dim))
18     
19     # Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
20     for word, index in word_to_index.items():
21         emb_matrix[index, :] = word_to_vec_map[word]
22 
23     # Define Keras embedding layer with the correct output/input sizes, make it trainable. Use Embedding(...). Make sure to set trainable=False. 
24     embedding_layer = Embedding(vocab_len, emb_dim, trainable=False)
25     ### END CODE HERE ###
26 
27     # Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
28     embedding_layer.build((None,))
29     
30     # Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
31     embedding_layer.set_weights([emb_matrix])
32     
33     return embedding_layer
34 
35 
36 embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
37 print("weights[0][1][3] =", embedding_layer.get_weights()[0][1][3])

weights[0][1][3] = -0.3403

3.2构建Emojifoer-V2

将上面构建的Embedding层的输出输入到LSTM中。

模型的输入是(m, max_len),定义在了input_shape中,输出是(m, C=5),即5个可能表情的概率值,需要参考Input(shape=...,dtype='...'), LSTM()Dropout()Dense(), 与 Activation().

LSTM参数:

  • units:是输出的维度,假如units=128,就一个单词而言,你可以把LSTM内部简化看成 ,X为词向量比如64维,W中的128就是units,也就是说通过LSTM,把词的维度由64转变成了128。
  • return_sequences:如果为False的话,只返回最后一个状态的输出,是一个(samples,output_dim)2D张量,如果是True,则是返回所有序列状态的输出,是一个(samples,timesteps,output_dim)3D张量。其中samples是样本个数,timesteps是每个样本的状态序列个数,input_dim是每个状态下的特征数
 1 def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
 2     """
 3     Function creating the Emojify-v2 model's graph.
 4     
 5     Arguments:
 6     input_shape -- shape of the input, usually (max_len,)
 7     word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
 8     word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
 9 
10     Returns:
11     model -- a model instance in Keras
12     """
13     ### START CODE HERE ###
14     # Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
15     sentence_indices = Input(shape=input_shape, dtype='int32')          #(m,max_len)
16     
17     # Create the embedding layer pretrained with GloVe Vectors (≈1 line)
18     embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
19     
20     # Propagate sentence_indices through your embedding layer, you get back the embeddings
21     embeddings = embedding_layer(sentence_indices)
22     
23     # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
24     # Be careful, the returned output should be a batch of sequences.
25     X = LSTM(128, return_sequences=True)(embeddings)            #X.shape= (?, ?, 128)
26     # Add dropout with a probability of 0.5
27     X = Dropout(0.5)(X)                                         #rate=0.5,设置需要丢弃的输入比例
28     
29     # Propagate X trough another LSTM layer with 128-dimensional hidden state
30     # Be careful, the returned output should be a single hidden state, not a batch of sequences.
31     X = LSTM(128, return_sequences=False)(X)
32     # Add dropout with a probability of 0.5
33     X = Dropout(0.5)(X)
34     # Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
35     X = Dense(5)(X)                                             #units=5,输出数组的维度为 (*, 5)
36     # Add a softmax activation
37     X = Activation('softmax')(X)
38     
39     # Create Model instance which converts sentence_indices into X.
40     model = Model(inputs=sentence_indices, outputs=X)
41     
42     ### END CODE HERE ###
43     
44     return model
45 
46 
47 model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
48 model.summary()

数据集中所有句子都小于10个单词,所以选择max_len=10。接下来的代码中有“20,223,927”个参数,其中“20,000,050”个参数没有被训练(这是因为它是词向量),剩下的是有“223,877”被训练了的。因为单词表有400,001个单词,所以是400,001∗50=20,000,050个不可训练的参数。

 1 _________________________________________________________________
 2 Layer (type)                 Output Shape              Param #   
 3 =================================================================
 4 input_2 (InputLayer)         (None, 10)                0         
 5 _________________________________________________________________
 6 embedding_5 (Embedding)      (None, 10, 50)            20000050  
 7 _________________________________________________________________
 8 lstm_1 (LSTM)                (None, 10, 128)           91648     
 9 _________________________________________________________________
10 dropout_1 (Dropout)          (None, 10, 128)           0         
11 _________________________________________________________________
12 lstm_2 (LSTM)                (None, 128)               131584    
13 _________________________________________________________________
14 dropout_2 (Dropout)          (None, 128)               0         
15 _________________________________________________________________
16 dense_1 (Dense)              (None, 5)                 645       
17 _________________________________________________________________
18 activation_1 (Activation)    (None, 5)                 0         
19 =================================================================
20 Total params: 20,223,927
21 Trainable params: 223,877
22 Non-trainable params: 20,000,050
23 _________________________________________________________________

创建模型以后,需要编译并评估这个模型。

1 #编译模型,使用categorical_crossentropy损失,adam优化器与 [‘accuracy’] 指标
2 model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
3 
4 #X_train转化为X_train_indices,Y_train转化为Y_train_oh
5 X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)      #(132, 10)
6 Y_train_oh = convert_to_one_hot(Y_train, C = 5)                             #(132, 5)
7 
8 #拟合模型
9 model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True)

执行结果:

  1 Epoch 1/50
  2 132/132 [==============================] - 1s - loss: 0.0283 - acc: 0.9848         
  3 Epoch 2/50
  4 132/132 [==============================] - 0s - loss: 0.3319 - acc: 0.9394     
  5 Epoch 3/50
  6 132/132 [==============================] - 0s - loss: 0.1317 - acc: 0.9621     
  7 Epoch 4/50
  8 132/132 [==============================] - 0s - loss: 0.0577 - acc: 0.9848     
  9 Epoch 5/50
 10 132/132 [==============================] - 0s - loss: 0.0205 - acc: 0.9924     
 11 Epoch 6/50
 12 132/132 [==============================] - 0s - loss: 0.0164 - acc: 0.9924     
 13 Epoch 7/50
 14 132/132 [==============================] - 0s - loss: 0.0219 - acc: 0.9848     
 15 Epoch 8/50
 16 132/132 [==============================] - 0s - loss: 0.0028 - acc: 1.0000         
 17 Epoch 9/50
 18 132/132 [==============================] - 0s - loss: 0.0017 - acc: 1.0000     
 19 Epoch 10/50
 20 132/132 [==============================] - 0s - loss: 0.0079 - acc: 0.9924     
 21 Epoch 11/50
 22 132/132 [==============================] - 0s - loss: 0.0036 - acc: 1.0000     
 23 Epoch 12/50
 24 132/132 [==============================] - 0s - loss: 0.0015 - acc: 1.0000         
 25 Epoch 13/50
 26 132/132 [==============================] - 0s - loss: 0.0017 - acc: 1.0000     
 27 Epoch 14/50
 28 132/132 [==============================] - 0s - loss: 0.0045 - acc: 1.0000         
 29 Epoch 15/50
 30 132/132 [==============================] - 0s - loss: 0.0011 - acc: 1.0000     
 31 Epoch 16/50
 32 132/132 [==============================] - 0s - loss: 8.1594e-04 - acc: 1.0000     
 33 Epoch 17/50
 34 132/132 [==============================] - 0s - loss: 0.0018 - acc: 1.0000         
 35 Epoch 18/50
 36 132/132 [==============================] - 0s - loss: 8.9179e-04 - acc: 1.0000     
 37 Epoch 19/50
 38 132/132 [==============================] - 0s - loss: 7.8024e-04 - acc: 1.0000     
 39 Epoch 20/50
 40 132/132 [==============================] - 0s - loss: 8.1553e-04 - acc: 1.0000     
 41 Epoch 21/50
 42 132/132 [==============================] - 0s - loss: 6.0965e-04 - acc: 1.0000     
 43 Epoch 22/50
 44 132/132 [==============================] - 0s - loss: 7.9421e-04 - acc: 1.0000     
 45 Epoch 23/50
 46 132/132 [==============================] - 0s - loss: 5.9728e-04 - acc: 1.0000     
 47 Epoch 24/50
 48 132/132 [==============================] - 0s - loss: 0.0015 - acc: 1.0000         
 49 Epoch 25/50
 50 132/132 [==============================] - 0s - loss: 6.5783e-04 - acc: 1.0000     
 51 Epoch 26/50
 52 132/132 [==============================] - 0s - loss: 3.8151e-04 - acc: 1.0000     
 53 Epoch 27/50
 54 132/132 [==============================] - 0s - loss: 7.5537e-04 - acc: 1.0000     
 55 Epoch 28/50
 56 132/132 [==============================] - 0s - loss: 6.7794e-04 - acc: 1.0000     
 57 Epoch 29/50
 58 132/132 [==============================] - 0s - loss: 8.7243e-04 - acc: 1.0000     
 59 Epoch 30/50
 60 132/132 [==============================] - 0s - loss: 5.2032e-04 - acc: 1.0000     
 61 Epoch 31/50
 62 132/132 [==============================] - 0s - loss: 0.0010 - acc: 1.0000     
 63 Epoch 32/50
 64 132/132 [==============================] - 0s - loss: 5.7492e-04 - acc: 1.0000     
 65 Epoch 33/50
 66 132/132 [==============================] - 0s - loss: 6.1675e-04 - acc: 1.0000     
 67 Epoch 34/50
 68 132/132 [==============================] - 0s - loss: 4.4331e-04 - acc: 1.0000     
 69 Epoch 35/50
 70 132/132 [==============================] - 0s - loss: 3.7460e-04 - acc: 1.0000     
 71 Epoch 36/50
 72 132/132 [==============================] - 0s - loss: 4.6226e-04 - acc: 1.0000     
 73 Epoch 37/50
 74 132/132 [==============================] - 0s - loss: 2.9814e-04 - acc: 1.0000     
 75 Epoch 38/50
 76 132/132 [==============================] - 0s - loss: 4.8404e-04 - acc: 1.0000     
 77 Epoch 39/50
 78 132/132 [==============================] - 0s - loss: 3.0782e-04 - acc: 1.0000     
 79 Epoch 40/50
 80 132/132 [==============================] - 0s - loss: 3.2745e-04 - acc: 1.0000     
 81 Epoch 41/50
 82 132/132 [==============================] - 0s - loss: 3.2599e-04 - acc: 1.0000     
 83 Epoch 42/50
 84 132/132 [==============================] - 0s - loss: 8.4097e-04 - acc: 1.0000     
 85 Epoch 43/50
 86 132/132 [==============================] - 0s - loss: 2.8714e-04 - acc: 1.0000     
 87 Epoch 44/50
 88 132/132 [==============================] - 0s - loss: 4.5671e-04 - acc: 1.0000     
 89 Epoch 45/50
 90 132/132 [==============================] - 0s - loss: 3.2355e-04 - acc: 1.0000     
 91 Epoch 46/50
 92 132/132 [==============================] - 0s - loss: 3.3007e-04 - acc: 1.0000     
 93 Epoch 47/50
 94 132/132 [==============================] - 0s - loss: 2.8113e-04 - acc: 1.0000     
 95 Epoch 48/50
 96 132/132 [==============================] - 0s - loss: 4.7191e-04 - acc: 1.0000     
 97 Epoch 49/50
 98 132/132 [==============================] - 0s - loss: 5.0544e-04 - acc: 1.0000     
 99 Epoch 50/50
100 132/132 [==============================] - 0s - loss: 1.7207e-04 - acc: 1.0000   
View Code

准确率基本为1

测试模型在训练集上的表现

1 X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen)     #(56, 10)
2 Y_test_oh = convert_to_one_hot(Y_test, C = 5)                                      #(56, 5)
3 loss, acc = model.evaluate(X_test_indices, Y_test_oh)
4 
5 print("Test accuracy = ", acc)

32/56 [================>.............] - ETA: 0sTest accuracy =  0.821428562914

查看哪些句子被错误标记

1 C = 5
2 y_test_oh = np.eye(C)[Y_test.reshape(-1)]
3 X_test_indices = sentences_to_indices(X_test, word_to_index, maxLen)
4 pred = model.predict(X_test_indices)
5 for i in range(len(X_test)):
6     x = X_test_indices
7     num = np.argmax(pred[i])
8     if(num != Y_test[i]):
9         print('Expected emoji:'+ label_to_emoji(Y_test[i]) + ' prediction: '+ X_test[i] + label_to_emoji(num).strip())

Expected emoji:😄 prediction: he got a very nice raise ❤️
Expected emoji:😄 prediction: she got me a nice present ❤️
Expected emoji:😞 prediction: This girl is messing with me ❤️
Expected emoji:😄 prediction: you brighten my day ❤️
Expected emoji:😞 prediction: she is a bully ❤️
Expected emoji:⚾ prediction: enjoy your game😄
Expected emoji:😞 prediction: My life is so boring ❤️
Expected emoji:😄 prediction: will you be my valentine ❤️
Expected emoji:⚾ prediction: what is your favorite baseball game 😄
Expected emoji:😞 prediction: go away ⚾

自己测试:

1 x_test = np.array(["not feeling happy"])
2 X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
3 print(x_test[0] +' '+  label_to_emoji(np.argmax(model.predict(X_test_indices))))

not feeling happy 😞

posted @ 2020-07-07 20:07  最咸的鱼  阅读(575)  评论(0编辑  收藏  举报