Scene Text Recognition

To recognise texts from images, i.e., advertisements, traffic signals, we decompose the Scene Text Recognition to two stages:

  • Scene Text Detection (STD). Detect and extract the texts with Progressive Scale Expansion Network (PSENet). See paper here. PSENet generates the different scale of kernels for each text instance, and gradually expands the minimal scale kernel to the text instance with arbitrary shape.
  • Optical Character Recognition (OCR). Recognize the extracted texts with Convolutional Recurrent Neural Network (CRNN). see paper here. CRNN extracts Convolutional features with CNN and feed the feature sequence to a bidirectional LSTM. It can handle different lengths of texts and does not require character- or word-level annotation in training with the help of CTC loss. More info for CTC loss.

Demo: