-ResNet is used to extract image features,obtain the output of the specified layer. Build the mapping for the description to generate the word vector, and then use the LSTM output. After full connection, predict the following words and calculate cross entropy loss.
愿为天下目,萃聚六路华
浙公网安备 33010602011771号