Import images/pdfs and OCR

Ami · 02-14-2021 07:14 PM

Hello
Is there a way, perhaps with the new automation tools to upload a bunch of files(Import) and then have the app run an OCR on each image/pdf to get data from it?

My scenario is this:
I have alot of pdfs of previous inspections that are not digitized, i want to add those past inspections to my database, right now someone has to take a picture of each pdf manualy and save it in order to capture data. As i may have up to 1000 or more older inspections it may prove to be not worth the work time to do so.

Yanjin_Long

Hi Ami,

We have a new feature to allow upload of documents (currently we support Invoice, Receipts, and W9 forms) as a Drive folder, and we will process and extract the data from these documents into a table. For the inspection pdfs you mentioned, this new feature may not be helpful as inspection documents aren’t a type of document that we can support at this time. However, we will be looking into expanding the features of our Document Solution and appreciate you sharing this use case with us. We will keep you updated as new features are rolled out.

Ami

Thank you for the reply. I will check it out.

Yanjin_Long

Let me know if you have any questions

Philip_Stephens

Ami, if you need a custom OCR model to be used (and not one of the prebuilt ones Yanjin mentions) we will be adding this support to the automation soon. Will update this thread once it is ready.

srich48

This very top of this thread mentions that you can Import PDFs for OCR, but it seems that the OCR models only support Images. Am I missing something? Does Appsheet have no way to scan the text in a PDF?

JoesMaker

I have the same impression. OCR does not seem to work with invoice pdf documents, but works with png images. Am I also missing something?

Philip_Stephens

To clarify there are two distinct features: OCR and Document Extraction.

OCR is for custom built models and currently only runs in app on single page images (png/gif). It has been in the product for several years.

Document Extraction is a new feature that runs server side and can be integrated into automation. This feature only works for Invoices, Receipts and W9 documents currently. It can extract from pdf, png, gif, jpg and tiff.

We would like to consolidate these two features into one experience. We will post an announcement once that is completed.

Ami

There is a solution at the moment with integromat and Pdf.co to extract invoice data and pictures as well

julianp

Any news on adding pdf support to OCR?

Steve

Not to my knowledge.

mateo

Is there any update maybe? 😊

dpavlov

Any update will be highly appriciated.

Steve

I'm not aware of any change.