- Worldstart's Tech Tips Newsletter - http://www.worldstart.com -
Converting Images to Text
Posted By On December 10, 2010 @ 10:08 AM In MS Word,Multimedia,Printing Help,Uncategorized | Comments Disabled
Daniel, from Florida asks:
I’m able to copy documents I’ve typed on Word into the body of an (AOL) e-mail but I can’t copy any text that was scanned into my computer. How do I copy scanned documents into the body of an e-mail? I’m not talking about an attachment.
Converting Images to Text
Today, everything seems to be about speed. There just doesn’t seem to be enough time to accomplish the multitude of tasks that make up our daily lives. As such, anything that can reduce the time spent performing repetitive tasks such as typing is more than welcomed.
There are quite a few alternatives to typing, such as voice recognition, but none offer more benefits than OCR software. OCR (optical character recognition) is the process of converting scanned documents and image files into fully editable text. It’s capable of accomplishing this by analyzing a document and comparing it to fonts stored in its database. It can also guess unrecognized characters by comparing different character features.
The technology for text recognition has grown immensely in just a few years and it’s now capable of recognizing any type of text (even handwritten text). Despite this, 100% accuracy for OCR software remains impossible with today’s technology.
Depending on the resolution and contrast of the scanned document and the precision of the software, you will receive text that’s about 80% – 90% accurate. Also, minor errors like missing letters or misspelling will occur even with the best text recognition software.
Still, unless you’re a fast typist, the amount of time that you’ll need to correct a few misspelled words is nowhere near the amount of time it takes to type a document manually.
While there are many software options for text recognition, their performance and accuracy varies wildly based on the OCR engine they use. One of the most accurate open source OCR engines is the Tesseract engine.
FreeOCR is a free application that provides a simple graphical interface for the Tesseract engine. Besides it’s simple interface, FreeOCR supports most image files and PDF documents and is compatible with most scanning devices.
You can download FreeOCR here.
After saving the freeocr.exe file on your computer, double-click on it and follow the instructions to install the application.
With the application installed, go to the desktop and double-click on its shortcut to open FreeOCR.
The interface is split into two windows to make the OCR process easy to understand. On the left, you can see the imported image, while on the right the extracted text is displayed.
You have three options for importing files into the application.
If you click the Scan button, your scanner interface will start and you’ll be able to scan your documents directly into the program.
Clicking the Open button allows you to select any image file on your computer and extract the text from it. The Open PDF button will do the same for PDF files.
Once you import an image file though one of these options, all you have to do is click the OCR button. This will start the OCR process. For best results, use an image that has a resolution of at least 300 dpi.
After the conversion is complete, the text in the right window is fully editable. You can correct any errors right there. Once you’re satisfied with the results, click the blue W icon in the middle to export the text into Microsoft Word. Alternatively, click the button above it to transfer all the text to the clipboard.
By default, FreeOCR comes equipped to recognize text written in the English language. If you need to convert text for another language, you will have to install it separately.
To download extra language files for FreeOCR, click here.
Download the language file and save it on your computer. Since the archives belong to the tar.gz format, you will need an archive manager like 7-Zip to extract the files.
Now, open FreeOCR, click on Settings and then click the Open Language Folder button. This will open the tessdata folder. Copy the extracted language file to this folder and restart the OCR software.
You can now change the language from the OCR Language dropdown menu.
OCR technology has yet to mature, but it can still increase your productivity while at the computer.
Article printed from Worldstart's Tech Tips Newsletter: http://www.worldstart.com
URL to article: http://www.worldstart.com/converting-images-to-text/
Click here to print.