<< Chapter < Page Chapter >> Page >

For OCR, we need to assume an image has certain textual characteristics. For example, there is no point in using a picture of a tree as input to text recognition software.

After reading in an image, the first step to OCR involves preprocessing. Any picture the software reads in can be represented by a matrix of red-green-blue(RGB) values. Rather than deal with three variables, we can make this more computation friendly by converting from RGB to grayscale; instead of a matrix with three separate values in each cell, we now have a matrix of intensity values between 0 and 255.

Another issue is that the images dealt with in OCR are not necessarily properly oriented; they may be skewed, angled, or flipped. This incorrect orientation would make it more difficult for us to correctly classify characters. In order to remedy this, a typical preprocessing requires us to transform and translate the pixels within the image in an attempt to realign the text.

Depending on our inputs and assumptions, there are multiple options on what we will use as a filter. For our purposes, we will be looking at a few filters meant for edge detection, specifically Gaussian, Laplacian, and Sobel.

After filtering, we will be utilizing OpenCV's wide array of functions to detect our characters and identify them. The API provided will handle tasks such as defining the threshold of an edge and the actual edge detection. For classifying the characters, OpenCV has a machine learning algorithm, K-Nearest Neighbors that we capitalize on.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Elec 301 projects fall 2014. OpenStax CNX. Jan 09, 2015 Download for free at http://legacy.cnx.org/content/col11734/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Elec 301 projects fall 2014' conversation and receive update notifications?

Ask