Skip to main content

Hello Community,

We are developing a new algorithm to help users detecting duplicates.

As an inspiration we toke the blog:

Celonis

https://aws1.discourse-cdn.com/business6/uploads/celonis4/original/1X/0c73d3f1a03f23cfc5f8bc53d47dc6032f8570b6.jpeg5 Steps to Reduce Costs Caused by Duplicate Invoices

Celonis customers have already saved millions by taking action on duplicate invoices. Find out how to how to use fuzzy matching to find duplicates and save money using the Celonis Intelligent Business Cloud and Python.

We have noticed that the algorithm detects false positives. In the algorithm, this might happen because we do fuzzy matching and searching. Now the users like to flag these false positives in the sheet. Moreover, they want the flags they made to be visible for other users using the sheet. How would you implement such a flagging system?

One idea we had was creating an additional column in the table that can be adjusted by users in the frontend. In this way, the adjustments could be visible for all the users. Does somebody have any idea whether this is possible?

Happy to hear your thoughts

https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=9Best,

Paul

Hi Paul,

By default the algorithm can result in false positives, based on experience from other projects I would recommend:

  1. Filtering our credit memos (In the latest SAP AP transformation script an activity has been added that makes this very easy)
  2. Fine-tuning the algorithm (you can choose different matching algorithms and different thresholds)
  3. Having a workshop with the users to find patterns in the false positives which can be filtered out

This can already improve the results a lot, off course it is possible that after this there are still false positives. For daily productive usage we recommend using the Action Engine to distribute matches to the users, when the users process the Signals from the Action Engine they can mark false positives, which can then be filtered out. Over time this will become a dataset that can be used to train a machine learning model to classify true and false positives.

Does this answer your question?

Best regards,

Simon Riezebos


Reply