OCR (Optical Character Recognition) is the process of converting images of text (handwritten, typed, or printed) into machine-encoded text. Essentially, it’s reading text from images. Classic OCR techniques involve segmentation of text into characters or words, and then classification of those shapes into letters/digits. Early OCR used pattern matching templates or feature-based classifiers (like SVM on HOG features of characters). Modern OCR often uses deep learning: convolutional networks or recurrent networks (or combination) that can output sequences of characters given an image (possibly using CTC loss for sequence training, as in some handwriting recognition systems). There’s also the concept of end-to-end OCR or scene text recognition: find text in a natural image (localize bounding boxes of text lines or words) and then recognize the text. Tesseract is a well-known open-source OCR engine. OCR is crucial for digitizing documents, reading signs in vision systems, assistive tech for the visually impaired, automatic number plate recognition, etc.
Data Selection & Data Viewer
Get data insights and find the perfect selection strategy
Learn MoreSelf-Supervised Pretraining
Leverage self-supervised learning to pretrain models
Learn MoreSmart Data Capturing on Device
Find only the most valuable data directly on devide
Learn More