Recently I wrote a post about my search for a TIFF iFilter that would enable me to use VBScript to query a Windows Indexing Services server for file management. I found that since OCR is never always 100% accurate, neither were my attempts at sorting all the inbound EMR faxes we get each day. I did however, find Tesseract , a great product that was originally developed by HP and proprietary, and is now developed by Google and licensed under the Apache License v2, open source . It is one of the most accurate open source OCR engines available. It is quite basic, and in the version you obtain from the project page , it only operates from the command line, and without the libtiff library, will only do it's work on un-compressed TIFFs. More information can be found on the project pages , and Wikipedia . Doing some scouring, I aso found a front-end , and ArchivistaBox, a complete document management system . Image via Wikipedia I'm using it in Windows, so I needed to do one of
Comments