CRNN+EAST实现银行卡号定位与识别

源码:https://github.com/ShawnHXH/BankCard-Recognizer

实现工具:Python 3.6, Win10, Keras(backend is TensorFlow)

 

CRNN:

需求分析:

  1. 银行卡号的长度大小并不是固定不变的,有的有20个字符,有的只有19个。所以模型要能够识别不定长度的卡号;

      2. 模型的输入是图像,输出是文本,故模型既需要涉及CNN也需要涉及到RNN,故称为CRNN。

模型选取:

  1. 不定长度的识别,目前多流行采用CTC作为损失函数;

  2. CNN则选择采用了VGG, RNN可以使用双向LSTM(BLSTM)或GRU;

模型预览:

  1. CNN部分:

 1      def PatternUnits(inputs, index, activation="relu"):
 2             inputs = BatchNormalization(name="BN_%d" % index)(inputs)
 3             inputs = Activation(activation, name="Relu_%d" % index)(inputs)
 4 
 5             return inputs
6      initializer = initializers.he_normal() 7      inputs = Input(shape=(img_height, img_width, 1), name='img_inputs') 8 x = Conv2D(64, (3, 3), padding="same", kernel_initializer=initializer, name='Conv2d_1')(inputs) 9 x = PatternUnits(x, 1) 10 x = MaxPooling2D(strides=2, name='Maxpool_1')(x) 11 x = Conv2D(128, (3, 3), padding="same", kernel_initializer=initializer, name='Conv2d_2')(x) 12 x = PatternUnits(x, 2) 13 x = MaxPooling2D(strides=2, name='Maxpool_2')(x) 14 15 x = Conv2D(256, (3, 3), padding="same", kernel_initializer=initializer, name='Conv2d_3')(x) 16 x = PatternUnits(x, 3) 17 x = Conv2D(256, (3, 3), padding="same", kernel_initializer=initializer, name='Conv2d_4')(x) 18 x = PatternUnits(x, 4) 19 x = MaxPooling2D(pool_size=(2, 1), strides=(2, 1), name='Maxpool_3')(x) 20 21 x = Conv2D(512, (3, 3), padding="same", kernel_initializer=initializer, name='Conv2d_5')(x) 22 x = PatternUnits(x, 5) 23 x = Conv2D(512, (3, 3), padding="same", kernel_initializer=initializer, name='Conv2d_6')(x) 24 x = PatternUnits(x, 6) 25 x = MaxPooling2D(pool_size=(2, 1), strides=(2, 1), name='Maxpool_4')(x) 26 27 x = Conv2D(512, (2, 2), padding='same', activation='relu', kernel_initializer=initializer, name='Conv2d_7')(x) 28 x = PatternUnits(x, 7) 29 conv_output = MaxPooling2D(pool_size=(2, 1), name="Conv_output")(x) 30 x = Permute((2, 3, 1), name='Permute')(conv_output)

  2. RNN部分(使用BLSTM):

rnn_input = TimeDistributed(Flatten(), name='Flatten_by_time')(x)
y = Bidirectional(LSTM(256, kernel_initializer=initializer, return_sequences=True), merge_mode='sum', name='LSTM_1')(rnn_input)
y = BatchNormalization(name='BN_8')(y)
y = Bidirectional(LSTM(256, kernel_initializer=initializer, return_sequences=True), name='LSTM_2')(y)
y_pred = Dense(num_classes, activation='softmax', name='y_pred')(y)

  3. CTC损失函数:

def ctc_loss_layer(args):
    """
    y_true: True label.
    y_pred: Predict label.
    pred_length: Predict label length.
    label_length: True label length.
    :param args: (y_true, y_pred, pred_length, label_length).
    :return: batch_cost with shape (batch_size, 1).
    """

    y_true, y_pred, pred_length, label_length = args
    batch_cost = K.ctc_batch_cost(y_true, y_pred, pred_length, label_length)
    return batch_cost        

y_true = Input(shape=[max_label_length], name='y_true')
y_pred_length = Input(shape=[1], name='y_pred_length')
y_true_length = Input(shape=[1], name='y_true_length')
ctc_loss_output = Lambda(ctc_loss_layer, output_shape=(1,), name='ctc_loss_output')([y_true, y_pred, y_pred_length, y_true_length])

 

EAST:

当下最热的图像文本定位算法莫属CTPN,其次还有Faster RCNN, Seg-Link,Mask RCNN,EAST等等。

详情见:https://github.com/huoyijie/AdvancedEAST

 

posted on 2019-08-25 14:20  ShawnHu  阅读(447)  评论(0)    收藏  举报

导航