OCR Software: Optimizing text recognition, accuracy-Freeform and OCR Training
For an optimal OCR (Optical Character Recognition), the freeform-module can be added through a training module to enable controlled, self-learning text recognition. These modules are optional components of inPuncto’s inbox processing server biz²ScanServer.
The OCR software application starts via an OCR-protocol (there, the customer-specific recognition themes are adjusted). Inside the application, various data (e.g. invoice date, IBAN, order number, delivery note number etc.) is to be extracted from a document using the freeform-text recognition-module. In case the process fails entirely or partially, the document can be forwarded automatically or manually to the OCR training or training-workspace.
The OCR-training module is an optional component of the inbox processing server biz²ScanServer, inside which the position of different values can be trained based on different index fields. The index field positions determined for the respective document type with the OCR training are then saved in a data base, so that the information can be accessed for the next document of this type and the OCR application knows where to look for the respective information inside the document.
Short presentation of the process of „OCR training“ with documents (invoices, delivery notes, etc.):
If documents needs to be trained, it will happen in a „training workspace“.
- The training of a document does not necessarily include all index fields, as for some documents possibly not all index fields can be detected for certain document types.
- During training: with the help of your cursor you indicate, where every field that needs to be trained is located in the document.
- Since all the detection-results for every document are saved in the OCR training, you can simply indicate by a mouse-click, where the data relevant for a certain field will be stored.
There are two ways of transmitting documents to OCR training: The OCR software includes a module which determines whether a document needs to be trained. Additionally, the validation-user is given the possibility to explicitly forward documents to the training, if he finds the detection results are not satisfying.
Different optimization strategies: the OCR training can be completed either in advance or during the production process.
To receive the best results from both modules (freeform text recognition and OCR training module), there is a „voter“ downstream of the OCR software which compares the results from both modules and weighs them accordingly. The final result, being a combination of both modules, ensures a higher and better text recognition ratio than the freeform text recognition module alone.
In this video, we will show you how to use the OCR training workstation to train and optimize the OCR recognition:
Your inPuncto Team
Telephone: +49 (0) 711 66 188 500
Tested and certified!
Compliance requirements and audit-security for your document-related processes with inPuncto ECM software
We are pleased to arrange a free webinar for you!