CRNN+EAST实现银行卡号定位与识别
源码:https://github.com/ShawnHXH/BankCard-Recognizer
实现工具:Python 3.6, Win10, Keras(backend is TensorFlow)
CRNN:
需求分析:
1. 银行卡号的长度大小并不是固定不变的,有的有20个字符,有的只有19个。所以模型要能够识别不定长度的卡号;
2. 模型的输入是图像,输出是文本,故模型既需要涉及CNN也需要涉及到RNN,故称为CRNN。
模型选取:
1. 不定长度的识别,目前多流行采用CTC作为损失函数;
2. CNN则选择采用了VGG, RNN可以使用双向LSTM(BLSTM)或GRU;
模型预览:
1. CNN部分:
1 def PatternUnits(inputs, index, activation="relu"): 2 inputs = BatchNormalization(name="BN_%d" % index)(inputs) 3 inputs = Activation(activation, name="Relu_%d" % index)(inputs) 4 5 return inputs
6 initializer = initializers.he_normal() 7 inputs = Input(shape=(img_height, img_width, 1), name='img_inputs') 8 x = Conv2D(64, (3, 3), padding="same", kernel_initializer=initializer, name='Conv2d_1')(inputs) 9 x = PatternUnits(x, 1) 10 x = MaxPooling2D(strides=2, name='Maxpool_1')(x) 11 x = Conv2D(128, (3, 3), padding="same", kernel_initializer=initializer, name='Conv2d_2')(x) 12 x = PatternUnits(x, 2) 13 x = MaxPooling2D(strides=2, name='Maxpool_2')(x) 14 15 x = Conv2D(256, (3, 3), padding="same", kernel_initializer=initializer, name='Conv2d_3')(x) 16 x = PatternUnits(x, 3) 17 x = Conv2D(256, (3, 3), padding="same", kernel_initializer=initializer, name='Conv2d_4')(x) 18 x = PatternUnits(x, 4) 19 x = MaxPooling2D(pool_size=(2, 1), strides=(2, 1), name='Maxpool_3')(x) 20 21 x = Conv2D(512, (3, 3), padding="same", kernel_initializer=initializer, name='Conv2d_5')(x) 22 x = PatternUnits(x, 5) 23 x = Conv2D(512, (3, 3), padding="same", kernel_initializer=initializer, name='Conv2d_6')(x) 24 x = PatternUnits(x, 6) 25 x = MaxPooling2D(pool_size=(2, 1), strides=(2, 1), name='Maxpool_4')(x) 26 27 x = Conv2D(512, (2, 2), padding='same', activation='relu', kernel_initializer=initializer, name='Conv2d_7')(x) 28 x = PatternUnits(x, 7) 29 conv_output = MaxPooling2D(pool_size=(2, 1), name="Conv_output")(x) 30 x = Permute((2, 3, 1), name='Permute')(conv_output)
2. RNN部分(使用BLSTM):
rnn_input = TimeDistributed(Flatten(), name='Flatten_by_time')(x) y = Bidirectional(LSTM(256, kernel_initializer=initializer, return_sequences=True), merge_mode='sum', name='LSTM_1')(rnn_input) y = BatchNormalization(name='BN_8')(y) y = Bidirectional(LSTM(256, kernel_initializer=initializer, return_sequences=True), name='LSTM_2')(y) y_pred = Dense(num_classes, activation='softmax', name='y_pred')(y)
3. CTC损失函数:
def ctc_loss_layer(args): """ y_true: True label. y_pred: Predict label. pred_length: Predict label length. label_length: True label length. :param args: (y_true, y_pred, pred_length, label_length). :return: batch_cost with shape (batch_size, 1). """ y_true, y_pred, pred_length, label_length = args batch_cost = K.ctc_batch_cost(y_true, y_pred, pred_length, label_length) return batch_cost y_true = Input(shape=[max_label_length], name='y_true') y_pred_length = Input(shape=[1], name='y_pred_length') y_true_length = Input(shape=[1], name='y_true_length') ctc_loss_output = Lambda(ctc_loss_layer, output_shape=(1,), name='ctc_loss_output')([y_true, y_pred, y_pred_length, y_true_length])
EAST:
当下最热的图像文本定位算法莫属CTPN,其次还有Faster RCNN, Seg-Link,Mask RCNN,EAST等等。
详情见:https://github.com/huoyijie/AdvancedEAST
浙公网安备 33010602011771号