Skip to main content

Document OCR

GitHub Repository: https://github.com/junwai7159/Document-OCR

About this Project

This project is part of the SJTU ICE4309 - Image Processing & Content Analysis course.

We implemeted an 3-stage Optical Character Recognition (OCR) framework for converting in-the-wild documents to digitally readable and recognizable text.

Architecture of Document OCR

document_ocr

First Stage: Preprocessing

  • The images undergo preprocessing, including edge detection, contour detection, perspective transformation and binarization to further enhance the image.

Second Stage: Text Detection

  • The text detection module uses the DBNet model with MobileNetV3 as the backbone network.

Third Stage: Text Recognition

  • The text recognition module uses the CRNN model with MobileNetV3 as the backbone network.

Results Visualization

Edge Detection

Input ImageGrayscale ConversionGaussian BlurClosingCanny
input_imagegrayscale_conversiongaussian_blurclosingcanny

Contour Detection

LSDHorizontal Line SegmentsVertical Line SegmentsFinal Contour
lsdhorizontal_line_segmentsvertical_line_segmentsfinal_contour

Perspective Transformation & Binarization

Perspective TransformationBinarization
perspective_transformationbinarization

Text Detection & Recognition

Text DetectionText Recognition
text_detectiontext_recognition

References