Skip to main content

Is there a way to extract data from image or PDF and relate it with respect to the document number and

 

We need to visit external URL and provide the document number. Each document will have its attachment which can be in PDF/Image format. We need to extract data from PDF/Image and prepare a table of the details (tabular format output) This table needs to be push to backend data model.

 

Overall, we want to extract data from image/PDF relate with the PO Number/Accounting document number and push this data to backend data model.

No option to do it with Celonis out-of-box features. Only option is Regex on PDFs with searchable text (I would not recommend it at all), or AI/image recogniztion option with Python (ML workbench)/RPA vendors (UiPath could definitely do it). Once the information is retrieved, pushing it to Celonis via API is piece of cake.


I am thinking of UIpath solution to connect to website and extract data. Can you share information on the last part "Once the information is retrieved, pushing it to celonis via API" how we can do it


Two seconds of googling would bring you multiple results...

Link 1 Data push API: Getting Started (celonis.com)

Link 2 PyCelonis: Data Upload - PyCelonis

 

Maybe you can get away with some open source projects like: invoice2data · PyPI - It's much simpler to try to use it than setting up whole infrastructure and building robots in RPA frameworks like REF (Robotic Enterprise Framework) as it's pricey.

 

If you even didn't know about those options, probably you'll need some technological support (devs/tech leads), as those topics are tricky and require a lot of tech know-how (not only on writing code, but on general architecture, and maintenance).

 

Mark as best answer if that helped.

 

Best Regards,

Mateusz Dudek


Reply