Site icon The Crowley Company

OCR? ICR? IWR? OMG! Get the Most from Your Scanned Text

OCRimageIn celebration of last Friday’s National Handwriting Day, I decided to write a blog about Optical Character Recognition (OCR). Only when researching it for this blog did I discover that OCR actually has nothing to with handwriting, once again proving how little I really know about the vast imaging industry (despite the approach of my second anniversary with the company). It was then that I discovered ICR and IWR. More on that later.

In layman’s terms (which is more my speed), OCR is the process by which typewritten (not handwritten) text in an image is recognized and converted into editable and searchable digital content. This is a useful tool for anyone who wants to add a search function to the material they’ve scanned. Searchable text simplifies image retrieval and reuse. It’s unlikely a program will guarantee 100% accuracy, but there are some that come quite close. And – as with much in our industry – it all comes down to scanner image quality.

Better Image Quality = Better OCR Accuracy

A pixelated image, varying font styles and characters that look alike are all possible reasons for inaccuracy in OCR processes

Peter Faber, sales manager with German production document scanner manufacturer InoTec GbmH, gave me his thoughts on OCR and image quality. “OCR is a difficult technology. The better your digital document, the better your OCR results will be.” He continues, “If the image is sharp and without pixel failure or noise,[1] the OCR program has a better chance of recognizing the characters. Of course, if you have a serif or handwriting style font, it makes it more difficult to recognize. If the image quality is better, the OCR program requires less time to recognize text.”

Ed Stracka, Crowley Imaging project manager, is of the same mind as Faber and gave me some insight into OCR from the service bureau perspective. He says, “Many users swear by a particular piece of software to produce text from a scanned document.  The reality is that most of the popular OCR programs do a really good job of deciphering the characters.  Generally, the higher quality the image, the more accurate the result.” With many years’ experience supervising scan jobs requiring OCR, Stracka is very familiar with the tips and tricks required to get the best results. “Since Crowley Imaging is expected to produce quality in all we do, we pay particular attention to not only the resolution of the scan but the capabilities of the software we are using to produce the OCR results.  Some software has difficulty in resolving text next to a black border of the image.  In those cases, we may remove or change the polarity of the border.  Some software has difficulty with symbols, such as copyright, registered trademark and others.  Some software has difficulty with languages that have diacritics.[2] Knowing the capabilities and shortcomings of the software is as important as the capability to scan at higher resolutions.”

What are the Benefits of Better OCR Results?

Technology has come a long way in the pursuit of text recognition. The more accurate it gets, the less effort is spent to fix inaccurate data. This should lead to time saved in post-processing and reduced labor hours. This, in turn, lowers the overall cost to scan and contributes to increased Return on Investment (ROI).

Faber gives an example of another possible benefit, saying, “With a high-quality scanner, and depending on the original, you might be able to scan with 200 dpi resolution instead of 300 dpi. This will make the file size smaller, saving digital storage space.”

So, What About ICR and IWR?

An example of a form which may be filled out by hand and converted to searchable text using ICR

Not to be confused with OCR, Intelligent Character Recognition (ICR) is one technology that recognizes handwritten text. However, there are limitations to this technology. ICR programs are adept at recognizing written characters that are structured, meaning evenly spaced. One example is a form on which one writes information in fields with boxes sanctioned for individual letters. Character recognition for unstructured or free-form handwriting, such as cursive, is called Intelligent Word Recognition (IWR) because it attempts to recognize the entire word instead of individual characters.*

No matter which program you may be using, capturing high-quality images is key to fast and accurate text recognition. I once again see the advantage in our offerings of archive-quality scanning equipment and the advanced technology utilized in our Crowley Imaging service bureaus. After 35 years in the business, we know that the better the image, the more useful it is to our clients.

Questions about Scanning or Character Recognition?

If you have any questions about character recognition technology or are interested in our document scanning equipment or services that offer this feature, please contact us by calling (240) 215-0224 or email us at blog@thecrowleycompany.com. You can also follow The Crowley Company on Facebook, Twitter, Google+ , LinkedIn, Pinterest and YouTube.

*Editor’s Note: Although ICR/IWR technology has come a long way in recognizing handwriting, it’s not an exact science. As such, there is still a demand for human technology. The Smithsonian Institution is currently seeking digital volunteers for their Transcription Center in an effort to “make [their] collections more accessible and useful to curators, researchers, and anyone with a curious spirit.”


[1] http://en.wikipedia.org/wiki/Image_noise

[2] http://en.wikipedia.org/wiki/Diacritic

Author

With a bachelor’s degree in Mass Communication from Towson University, Camily Bishop serves as The Crowley Company’s sales and marketing assistant. A self-proclaimed member of the grammar police and avid reader of classical fiction, you can find her curled up with a good e-book or, on a nice day, experiencing the great outdoors – perhaps at the nearest wine festival.

Exit mobile version