What is OCR?
Optical Character Recognition (OCR) is a technology that converts images with text into data that can be handled as plain text in any text editor.
RPA processes often have tasks that require using of OCR. Typically, scanned documents, PDF files, and photos need to be recognized in order to extract certain information from them, often process it, and enter it into some system.
The documents themselves can have a strictly fixed structure, where we can predict the location of certain data on the sheet (e.g. we know in advance the location of the "Name" field in a passport), and a less strictly fixed structure (e.g. a payment bill has a known set of fields, but these fields may be in different documents in different areas), or sometimes they do not have a clear structure at all.
The way to process such documents is based on the final task. In general, we can distinguish four main approaches to the recognition of text from a picture:
Converting an image (document) into plain text. This approach allows you to recognize the data from an image and output it as a set of characters, with which you can work, for example, in a text editor. In this case, the definition of the meaning of the text is left to the person. For example, if the bot needs to find the amount of a bill in the text, the search algorithm must be written by the bot's developer. It is also worth keeping in mind that with this recognition, elements of the formatting of the document - the location of individual words, characters, etc. may be lost.
- Convert image (document) to plain text with characteristic forms (table, "key-value" pairs and similar) highlighted. This method is similar to the previous one, but in the output, besides plain text, we get a separately formatted table or other characteristic form (a list of "key-value" pairs, for example). This means that the implementation of an algorithm for finding information in such data does not require extracting table elements from the plain text, but can work with the table itself directly. Nevertheless, it is still up to the developer to determine the meaning of the information.
- Converting an image into text by selecting recognition zones (specifying a recognition pattern). Typically, this approach makes sense in documents with a strictly fixed structure. For example, we work with scans of passports and we know that the passport issue date is always located in this particular area of the document and nowhere else. In this case, we can simply ask the bot to look in this particular area of the document and return its contents, recognizing the text from this area. This somewhat simplifies the determination of the meaning of the information, because the picture is divided into areas just by the semantic criterion (for example, the area with the date of issue of the passport, the area with the name of the passport holder, the area with the place of issue of the passport, and others).
- Converting an image (document) into a data set (or data extraction). You may often encounter the term "data extraction". This term reveals the meaning of this approach. The output here is not just a set of characters, the algorithm for extracting information from which is left to us, but already transformed data. For example, we know what information is stored in invoices. The result of applying the described approach to the invoice will be a set of data like "Seller - Smith's cargo LLC.", "TIN - ..." and so on. Therefore, we do not need to extract the meaning of the recognized text independently, we already receive the transformed information and can immediately use it for its intended purpose.
Which OCR-services does ElectroNeek Support?
At ElectroNeek, we want to give our users the widest possible OCR experience, so we provide a set of solutions that implement all of the desctibed approaches:
- Converting an image (document) into plain text.
- Convert image (document) to plain text with characteristic forms (table, "key-value" pairs and similar) highlighted.
- Converting an image into text by selecting recognition zones (specifying a recognition pattern).
- The Recognition template activity. This is the ElectroNeek's own design.
- Converting an image (document) into a data set (or data extraction).
Is there a charge?
ElectroNeek provides users with a free complimentary OCR page package for one year for Google Cloud OCR and Microsoft Cloud OCR. We give this option for development and testing purposes so that the developer doesn't waste time signing up for an account with the vendor, but can focus immediately on the final task.
When a bot is developed and tested, the question arises of transferring the bot to a productive environment. Execution of bots in the productive environment is done through the Bot Runner tool. Since this tool is free and does not require mandatory authorization in our system for the execution of bots, it does not provide for the use of a complementary package. In this case we allow you to connect your own account with the vendor being used, so that the payment can be made on the vendor's side. An exception is if the user is already authorized in Bot Runner - then the complementary package can be used, but the number of pages of the complementary package is limited, so for production it will not be enough and in any case it will be necessary to connect your own account with the vendor.
As for the other available OCR vendors in the system - ElectroNeek provides connectors that allow you to connect your own account with these vendors. This is done also because these vendors work with different types of documents, have flexible billing plans and can release regular updates. The connector allows you to always use the up-to-date version provided by the vendor and still easily change your package from that vendor without affecting your integration with the bot.