OCR Application: Optimizing text recognition accuracy-Freeform and OCR Training
For an optimal OCR (Optical Character Recognition), the freeform-module can be additionally supplemented by a training module to enable controlled, self-learning text recognition. These modules are optional components of inPuncto’s inbox processing server biz²ScanServer.
The OCR application is started via an OCR-protocol (there, the customer-specific recognition themes are adjusted). Inside the OCR application, various data (e.g. invoice date, IBAN, order number, delivery note number etc.) is to be extracted from a document using the freeform-text recognition-module. In case this fails to happen entirely or partially, the document can be forwarded either automatically or manually to the OCR training or training-workspace.
The module of OCR-training is an optional component of the inbox processing server biz²ScanServer, inside which the position of different values can be trained based on different index fields. The index field positions determined for the respective document type with the OCR training are then saved in a data base, so that the information can be accessed for the next document of this type and the OCR application knows where to look for the respective information inside the document.
Short presentation of the process of „OCR training“ with documents (invoices, delivery notes, etc.):
If documents are to be trained, this takes place in a „training workspace“.
- The training of a document does not necessarily have to include all index fields, for some document there are possibly not all index fields detected for certain document types.
- During training, you indicate with the help of your mouse pointer, where every field to be trained is located in the document.
- Since all of the detection-results for every document are saved in the OCR training, you can simply indicate by mouse-click, where the data relevant for a certain field is stored.
There are two ways of transmitting documents to OCR training: The OCR application includes a module which determines whether a document is to be trained. Additionally, the validation-user is given the possibility to explicitly forward documents to training, if he finds the detection results are not satisfying.
Different optimization strategies: the OCR training can be completed either in advance (before production operation) or during production operation.
To receive the best result from both modules (freeform text recognition and OCR training module), there’s a „voter“ downstream of the OCR application which compares the results from both modules and weights them accordingly. The final result, being a combination of both modules, ensures a higher and better text recognition ratio than the freeform text recognition module alone.
In this video, we will show you how to use the OCR training workstation to train and optimize the OCR recognition:
Your inPuncto Team
Telephone: +49 (0) 711 66 188 500
Tested and certified!
Compliance requirements and audit-security for your document-related processes with inPuncto ECM software
We are pleased to arrange a free webinar for you!