Microsoft Office Document Imaging Scanner



The other day I got some work from a high school teacher to type some past examinations that she was compiling into a book. The test papers were quite many, a file worth to be exact.

Microsoft Office Document Imaging Scanning Software It is possible for you to turn the tons of printed (hard copy) documents you have to electronic (digital) files. This is by capturing, storing, editing and reprinting them using Microsoft Office Document Imaging scanning software. But before you use it, you have to ensure that you have a scanner. The Microsoft Office Document Imaging has two components – the scanning and the imaging, respectively. The scanning component captures images from a compatible scanning device, just like any other scanning software, and then automatically converts text from these scanned images via OCR. Find Microsoft Document Imaging Software for all types of companies at ScanStore. We specailize in Microsoft Document Imaging Software for small business and distributed enterprises. You want a paperless office and document scanning is part of the path to get you there. Simply buying a scanner.

Microsoft Office Document Imaging Scanner

Named one of the “Best Apps of 2015” in the U.S. And Puerto Rico by Google Play! Office Lens trims, enhances, and makes pictures of whiteboards and documents readable. You can use Office Lens to convert images to PDF, Word and PowerPoint files, and save to OneNote, OneDrive, or your local device. You can even import images that are already on your device using Gallery. GET ORGANIZED Scan.

Knowing this would be a challenge in terms of the time and effort I’ll have to set aside, I decided to look for a much quicker solution. The first thing that came to my mind was OCR (Optical Character Recognition).

OCR is basically the identification of text from image files. In layman’s terms, think of it as converting images to text.

OCR can thus save you time and money that you’d otherwise spend typing or outsourcing to professionals. In my case, I was able to reduce the workload of this particular job by about 70-80% and it would be higher were it not for the few wrongly identified characters and the touching up of some diagrams.

For my OCR needs I went with MS Office. I had considered other options before settling for it, such as the feature rich PDF-Xchange Editor that bundles OCR with its PDF viewer. Ultimately however, they all proved to be less capable compared to the OCR engine in MS Office which was more accurate and quick.

I suppose that could be attributed to them using the free Tesseract OCR engine which while powerful in its own right, tends to be outperformed by commercial alternatives.

Getting Started: Microsoft Office OCR Options

MS Office does OCR in two ways:

  • Using OneNote
  • Using Microsoft Office Document Imaging (MODI)

Any version of OneNote (2007-2016) will do for this purpose. For MODI however, things are a little bit different as it was discontinued. MS Office 2007 was the last version to feature it.

However you don’t necessarily need to have MS Office 2007 to use it as it can be installed separately and be used with newer versions of MS Office.


What You’ll Need

  • First things first, you’ll need MS Office installed. Any version will do from Office 2007, 2010, 2013 and 2016.
  • SharePoint Designer 2007 to install MODI. SharePoint Designer 2007 is provided as a free download by Microsoft. Get it from Microsoft’s download centre.
  • MS Office 2007 to install MODI. If you’ve a licensed copy of MS Office 2007 already, you can use it instead of having to download SharePoint Designer 2007.
  • Image to OCR
  • A scanner if you want to OCR during the scanning process.

1. OCR with OneNote

1. Launch OneNote and start by creating a New Note.

2. In the ribbon, go to the Insert tab and insert the image to OCR.

Insert Image to OCR

3. Inside the note, right-click the inserted image and select Copy Text from Picture.

Copy Text from Image

4. Open MS Word or a text editor and paste the text that has been recognized.

5. You can alternatively search the text within OneNote instead of copy-pasting it elsewhere. To do that, right-click the inserted image and select Make Text in Image Searchable then select the language the text is in.

Make Text in Image Searchable

You can then use Ctrl+F to search for text inside the image. If it finds a match, it will be highlighted.

NOTE: If you need a different language, check the bottom of this post on how to install additional language packs.

2. OCR with Microsoft Office Document Imaging (MODI)

Step 1. Installing MODI

1. Run your SharePoint Designer 2007 or MS Office 2007 set up.

2. Select the Customize installation option.

3. Set all the available options to Not Available then expand Office Tools and set Microsoft Office Document Imaging to Run all from my Computer.

Install MODI

4. Now leave it to install.

Step 2: OCR with MODI

MODI OCRs in two ways:

  • OCRs Image Files
  • Connects with your scanner and automatically OCRs after the scanning is complete

i. OCR an Image

MODI only OCRs images that are in TIFF (*.tif, *.tiff) format. If you picture is in another format (e.g. JPEG, PNG, GIF) you can use an one of the many free image editors available online (XnView, IrfanView etc.) to convert them to TIFF.

Microsoft Office Document Imaging Scanner Scan

You can even use Paint to do the conversion. Just open the image with Paint, choose to Save as then select Other Formats. In the save dialog, select the TIFF type and save the image.

Once you have your images in this format, do the follwoing:

1. Go to the start menu programs and inside Microsoft Office Tools open Microsoft Office Document Imaging.

2. Inside MODI, click the Open icon and select your TIFF image from the dialog.

Open Image

3. Once the image is loaded inside MODI, click the Recognize Text Using OCR button.

4. Give it time to do the OCR. Once it’s done, click the Send Text to Word button.

Microsoft Office Document Imaging Scanner Lide

Send Text to Word

5. A dialog will pop up with options to send the text. If the TIFF had multiple pages, make sure to select the All Pages option. If the image had pictures/diagrams inside it that you’d wish to export too, check the option to Maintain pictures in output. Click the OK button.

Send Text to Word

6. The recognized text and any pictures it may have found will be exported to a HTML file opened by whichever version of Word you have installed.

ii. OCR directly from the Scanner

1. Connect your scanner and load the item to scan.

2. Go to the start menu programs and inside Microsoft Office Tools open Microsoft Office Document Scanning.

3. In the scanning window, click the Scanner button and select your scanner.

Select Scanner

4. Depending on the nature of the item you’re scanning, you can select a suitable color preset for it : Color, Grayscale or Black and White.

5. Click the Scanning button. Your item will be scanned and after its done OCR will be done automatically.

6. The recognized text will then be opened in MODI. Finish by click the Send Text to Word button to transfer the recognized text and any pictures to Word.

NOTE:
In the default save folder, you’ll find the HTML file containing the OCR information. Check inside the corresponding HTML folder for any pictures such as diagrams that MODI will have exported.

Language Support for MODI and OneNote

The OCR feature in OneNote and MODI comes embedded with support for only three languages: English, French and Spanish. By default it will use the language that your installed MS Office is using. To change the language MODI uses for the OCR do the following:

  • Open Microsoft Office Document Imaging
  • Go to Tools > Options…
  • Select OCR and then choose OCR Language
Select OCR Language

For other languages, particularly those using a completely different alphabet than what is used in English such as Greek, Korean, Chinese, Japanese, Arabic, Cyrillic (Slavic languages – Russian, Bulgarian, Serbian, Ukrainian) etc. you’ll have to install the corresponding Language Pack in order for it to work with OneNote or MODI.

1. Installing OCR Language Packs for OneNote

To install a language pack for OneNote to OCR with:

  • Open OneNote and go to: Options > Language.
  • Add the language from the drop down menu, then when it appears inside the languages box, click the Not Installed link below the Proofing column.
Add Language

That will take you to the Microsoft Office support site where you can download the free language packs. Make sure to download the correct language pack for the version of MS Office you’re using, i.e. whether 32 or 64-bit of MS Office 2010, 2013 or 2016.

Microsoft Office Document Scanner

Download Language Pack

2. Installing OCR Language Packs for MODI

For MODI, the process is a little bit complicated but there’s a really good guide on how to go about installing the language packs here.

If all this sounds like a lot of work, you can opt to use Tesseract which has a wide support for different languages. Tesseract however uses command line but you can find a couple of GUIs (versions with a user interface) for it online such as this one.

The scanned text of my document is hard toread.Try another scanning preset
  1. On theFile menu, clickScan New Document.
  2. In theScan New Document dialog box, select a preset from the list.

    TheBlack and white preset is optimizedfor optical character recognition (OCR) whenscanning black text on white paper. If you are scanning a document with colorcontent, lighter text, or colored paper, try theBlack and white from colorpage preset. TheGrayscale preset is good for scanningcontinuous-tone images (such as photographs) with text.

Increase the scanning resolution

You can only change the scanning resolution if the Show scanner driver dialog before scanning check box is clear in the Choose Scanner dialog box.

  1. On theFile menu, clickScan New Document.
  2. In theScan New Document dialog box, click a preset from the list.
  3. ClickPreset options, and then clickEdit selected preset.
  4. In thePreset Options dialog box, clickAdvanced.
  5. Click the arrow to the right of theResolution(DPI) list and choose a highernumber.

    Note The higher the resolution, the more space the resulting filewill require on your hard disk.

Try another scanning preset

  1. On theFile menu, clickScan New Document.
  2. In theScan New Document dialog box, click a preset from the list.

    TheColor preset is optimized for scanningphotographs and artwork. If you are scanning a document with continuous-tone,non-color content, try theGrayscale preset.

Increase the scanning resolution

You can only change the scanning resolution if the Show scanner driver dialog before scanning check box is clear in the Choose Scanner dialog box.

  1. On theFile menu, clickScan New Document.
  2. In theScan New Document dialog box, click a preset.
  3. ClickPreset options, and then clickEdit selected preset.
  4. In thePreset Options dialog box, clickAdvanced.
  5. Click the arrow to the right of theResolution(DPI) list and choose a highernumber.

    Note The higher the resolution, the more space the resulting filewill require on your hard disk.

Your images are improperly aligned

If your scanned pages are slightly misaligned, or scanned sidewaysor upside down, OCR quality will suffer. First, be sure the page is placed straight on the scanner bed. With a document that has already been scanned, try the following:

  1. On theToolsmenu, clickOptions,and then click theOCR tab.
  2. Ensure that theAuto rotate andAuto straighten check boxes areselected.
  3. ClickOK.
  4. On theToolsmenu, clickRecognize Text Using OCRto rerun OCR on the document and apply the two alignment options.
There isn't enough contrast between the document text and background

Use clean pages with high contrast between the text and the background and with a monotonic background. OCR does not perform as well on low contrast pages or on a background with varying color or brightness.

Your document is in a foreignlanguage

Even if their alphabets are similar, you should specify thelanguage of the document you are scanning if it is different from the Office language setting.

  1. On theToolsmenu, clickOptions,and then click theOCR tab.
  2. In theOCR Language list, select the languageyou want, and then clickOK.
  3. On theToolsmenu, clickRecognize Text Using OCR.
Try another scanning preset
  1. On theFile menu, clickScan New Document.
  2. In theScan New Document dialog box, click a preset from the list.

    Notes

    • TheBlack and white preset is optimizedfor OCR when scanning black text on white paper. If you are scanning a documentwith color content, lighter text, or colored paper, try theBlack and white from colorpage preset.

    • Whenever possible, scan pages in black and white. Good black and white scanning results in as few as possible broken character strokes, blurred strokes, touching characters, and speckles.
Increase the scanning resolution

For Asian characters equal to or smaller than the equivalent English 10 point font size, try a scanning resolution of higher than 300 dpi.

You can only change the scanning resolution if the Show scanner driver dialog before scanning check box is clear in the Choose Scanner dialog box.

  1. On theFile menu, clickScan New Document.
  2. In theScan New Document dialog box, select a preset from the list.
  3. ClickPreset options, and then clickEdit selected preset.
  4. In thePreset Options dialog box, clickAdvanced.
  5. Click the arrow to the right of theResolution(DPI) list and choose a highernumber.

    Note The higher the resolution, the more space the resulting filewill require on your hard disk.

Try another scanning preset

All of the built-in presets have been designed to produce scans ofoptimum quality at the fastest scanning speed and the smallest filesize.

  1. On theFile menu, clickScan New Document.
  2. In theScan New Document dialog box, click a preset from the list. TheBlack and white preset provides thefastest scanning speed.

Decrease the scanning resolution

You can only change the scanning resolution if the Show scanner driver dialog before scanning check box is clear in the Choose Scanner dialog box.

The higher the dots-per-inch resolution you are using to scan yourdocument, the longer it will take to scan. In addition, the size of the file onyour hard disk will increase.

  1. On theFile menu, clickScan New Document.
  2. In theScan New Document dialog box, select a preset from the list.
  3. ClickPreset options, and then clickEdit selected preset.
  4. In thePreset Options dialog box, clickAdvanced.
  5. Click the arrow to the right of theResolution(DPI) list and choose a lower number.

    NoteOptical character recognition(OCR) may lose accuracy at lower resolutions.

Try another scanning preset

All of the built-in presets have been designed to produce scans ofoptimum quality at the fastest scanning speed and the smallest filesize.

Microsoft Office Document Imaging And Scanning

  1. On theFile menu, clickScan New Document.
  2. In theScan New Document dialog box, click a preset from the list.

    TheBlack and white orBlack and white from colorpage presets will produce the smallest file sizes.

Decrease the scanning resolution

You can only change the scanning resolution if the Show scanner driver dialog before scanning check box is clear in the Choose Scanner dialog box.

The higher the dots-per-inch resolution you are using to scan yourdocument, the longer it will take to scan. In addition, the size of the file onyour hard disk will increase.

  1. On theFile menu, clickScan New Document.
  2. In theScan New Document dialog box, select a preset from the list.
  3. ClickPreset options, and then clickEdit selected preset.
  4. In thePreset Options dialog box, clickAdvanced.
  5. Click the arrow to the right of theResolution(DPI) list and choose a lower number.

    NoteOptical character recognition(OCR) may lose accuracy at lower resolutions.

Check tosee if Pan mode is activated

Microsoft Document Imaging

The page pane has two modes:Select and Pan. When Pan is activated, the mouse cursor is in the shape of ahand. The two modes are toggles, meaning that only one can be active at a time.

  • Open theView menu. If thePan command is selected, clickSelect to enable text selection in the pagepane.

Checkto see if Select Annotations mode is activated

Youcannot select text when the Select Annotations command is selected

  • Open theView menu. If theSelect command is not selected, clickSelect to enable text selection in the pagepane.

You need to run OCR

Document Imaging Microsoft Office 2013

You can select text in the page pane only after optical character recognition (OCR) has been performed. Todo so, clickRecognize Text Using OCRon theToolsmenu.

  1. On theFile menu, clickScan New Document.
  2. In theScan New Document dialog box, clickScanner, and then make sure the correctscanner is selected in theScanner list.
  3. Make sure that theUse automatic documentfeeder check box is selected.
  1. On theFile menu, clickScan New Document.
  2. In theScan New Document dialog box, click a preset from the list.
  3. ClickPreset options, and then clickEdit selected preset.
  4. On thePage tab of thePreset Options dialog box, clear theSave each page as aseparate document check box.