It’s not unheard of that you might receive a document via email that has been sent to you in an image form or perhaps as a PDF file and you need to be able to edit or have the text in an editable form. Sometimes you might have to scan some documents in via a scanner and need to be able to edit them from your word processor.
Editing an image or directly scanned in file is not possible by itself, and unless you have time to spare, won’t want to type everything in yourself. What you need is something that has been around a while called Optical character recognition (OCR) which translates text in images into text you can edit. Some of the best OCR software packages are Omnipage and FineReader but they cost big money. Here’s a selection of mostly free ways to get your text converted into an editable form.
FreeOCR is an OCR program based on the open source Tesseract engine which is maintained by Google and considered to be very accurate. It can accept input directly from a scanner, PDF file and several different types of image formats including multi page TIFF files while supporting conversion using 11 different languages. You can also select specific parts of the input document for conversion which is useful for multiple blocks or columns of text and the output can be exported directly to Word or as a Rich text format.
Pay attention during install as the program uses Install Manager to offer you a few bits of adware. FreeOCR works on Windows XP to Windows 8, install of the .NET Framework v2 is required for XP users. FreeOCR is also allowed to be used for commercial as well as personal use.
The SimpleOCR software is free for personal, educational and commercial use and accepts input from a scanner, JPG, BMP and multiple page TIFF images. The resulting text can be saved as a standard text file or Word document. After install, when you run SimpleOCR for the first time make sure to select the top “Machine Print” option which is free, the bottom option is a 14 day demo of the more advanced software. Then choose 1 of the 4 languages for your profile and click Select.
Several pages can be added by clicking on the Add Page button and converted using Convert to text. After the character recognition has completed, the resulting text will display in the lower window with colored words to inform you of potential issues with the spelling. Blue is suspect words, red is words not found in the program’s dictionary etc, and each of these words can be checked with a drop down list of alternatives offered.
i2OCR is a free and unlimited use online OCR conversion service from Sciweavers.org that accepts input from images of the TIF, JPG, PNG, BMP, GIF, PBM, PGM and PPM formats. There is support for a massive 33 languages and although the maximum file size is limited to 10MB, it should be enough for most general use.
Usage is quite simple, just select the button to browse for a file on your computer, or the URL option can grab a file directly from an online location such as Dropbox etc. Choose your language from the drop down and click the big button to convert the file, the conversion time was only a matter of seconds when tested. The conversion accuracy seems to be excellent although it is only plain text, and will appear side by side with the original image lower in the window which you can then click on to highlight and copy to a document or save directly as a Word .DOC file. Sciweavers also has several other useful format conversion tools including converting files to PDF.
4. Online OCR
Free Online OCR has a free and a paid service, the free one enabling you to convert up to 15 pages per hour. This includes uploading JPG, BMP, TIF, PNG, PCX, GIF and multi-page PDF documents to process into 1 of 32 recognised languages with a size of up to 4MB each. The output can be either a Word document (DOC), an Excel spreadsheet (XLS) or a plain text file (TXT).
Choose your local file for upload, click the Upload button, enter the numbered captcha and set your needed language and output format. Then click Recognize and wait a few seconds while it converts. The resulting text will appear underneath along with a button to download it as the chosen file format.
5. Free Online OCR
This online service has support for uploading of the most popular image formats of JPG, GIF, BMP, PNG, TIFF and also support for the OCR conversion of PDF documents. After conversion the resulting text can also be output to a few different formats of Word DOC, Richtext RTF, plain TXT and also a layered PDF document. The program also does its best to keep the text layout and formatting as close as possible to the original copy.
To use the service simply choose your file for upload and select what format you want it to be saved as, then click the button. You get a nice progress meter to look at during the conversion and a download button will appear once it’s complete. Free Online OCR seemed to work quite well and kept font sizes and formatting in most cases. The service is free to use but there is no mention of file size or usage limits which is slightly confusing as we don’t know if it’s truly unlimited or they just haven’t mentioned what the restrictions are…
This free online OCR service certainly has a lot of input format support. There are 9 common image formats, support for images inside Zip archives, multiple page documents such as PDF, TIFF and DjVu, and also DOCX and ODT files. The output list is smaller but still useful with TXT, DOC and PDF file saving available. Recognition is handled by the Tesseract and Cuneiform engines and can recognize a total of 58 languages as well as multi column text and also lower quality images.
To use NewOCR simply select your local file or one direct from a URL, choose the recognition language and then press the Preview button. This will load a preview page and under that, the OCR converted text is displayed. If you can’t see the text, press the OCR blue button. The text can be exported in a variety of ways including the standard download to one of the 3 file formats, copying to the clipboard, putting it through the Google or Bing translators, pasting online to Pastebin or Pastie and even sending direct to Google Docs. NewOCR has unlimited uploads and doesn’t require any registration.
7. Microsoft Office Document Imaging
As we know, Microsoft Office isn’t a free product but large numbers of users are likely to have some sort of version of it installed. The Office Document Imaging tool can perform OCR on a document and the results are very good, but unfortunately it’s not readily available on all versions of Office. Office 2003 should have it included in your installation by default, Office 2007 users will have to manually add it in from the add components option, and it’s not even in Office 2010 by default. Instructions of how to add MODI to Office 2010 can be found at Microsoft.com.
The Microsoft Office Document Imaging option can be found in your Start Menu -> Programs -> Microsoft Office -> Microsoft Office Tools. It only recognizes TIFF images as an input source so you will probably need to convert your documents beforehand. Open the file and click on the eye icon in the toolbar called “Recognize Text Using OCR”. Then click on the button to its right to send the text straight to Word.
Editor’s Note: OCROnline was another free service tested but you only have 5 free 1 page conversions a week which is a bit too restrictive, and you also have to create an account. The conversion quality is very good though if you only need the odd page now and again.
Google Docs also has an option to convert PDF files and images to documents via OCR. Go to your Google Drive and click on Options -> Upload Settings -> Convert text from uploaded PDF and image files, and also select the confirm option. This will then ask you if you want to OCR an image or PDF when you upload a file to Google Drive.