Skip to main content

Good morning,

 

Recently, at Solvay, we've been trying our hand at using Celonis' Data Deduplication ML solution

 

Now we've reached a point where we want to further customize it, which means we need to understand what is behind it!

 

For example:

imageHere, in point 3), it is mentioned that common scanning errors in Invoice References will be checked like "8" <-> "B". 

 

Is it possible for me to see (and modify) the list of Similar Characters? How do I access / change this code?

 

Thank you in advance!

 

Best Regards,

 

Vasco Carona, Solvay

Hi Vasco,

 

Thanks for your question! To the best of my knowledge, the source code for the pycelonis and pycelonis_apps packages are not open source. As a result, the underlying code is not generally available. Additional information on the package can be found in the documentation (https://celonis.github.io/pycelonis/). There are a variety of parameters that can be customized and configured but I'm not sure if you can necessarily modify the specific search logic.

 

Please let me know if I can provide any additional information.

Yuchen


Hello Yuchen,

 

That link is indeed where I found the information for the screenshot.

 

Can you let us know what we would need to do to access this part of the search logic and modify it?

Even for our own documentation, understanding exactly what it does, or why certain duplicate invoices are not being found would be valuable.

 

Ideally we would be able to tweak it slightly.

 

Thank you in advance

 

Vasco

 


Hi Vasco,

 

Thanks for your follow-up. To clarify, because the packages are not open source, end users are typically not able to modify the underlying Python code as it is proprietary. It sounds like you may be asking to adjust certain features like the similarity metric match threshold (e.g., allow more loosely related invoices to be matched as potential duplicates) which, to my knowledge, cannot be done with the currently available options.

 

But I would encourage you to please reach out to your Customer Success Manager to see if there is a way to work around this. We absolutely want to enable you for success and I'm sure options can be explored if this is required for your use case.

 

Kind regards,

Yuchen


Hello Yuchen,

 

That link is indeed where I found the information for the screenshot.

 

Can you let us know what we would need to do to access this part of the search logic and modify it?

Even for our own documentation, understanding exactly what it does, or why certain duplicate invoices are not being found would be valuable.

 

Ideally we would be able to tweak it slightly.

 

Thank you in advance

 

Vasco

 

Hi Vasco,

 

you probably did this already and it didn't help but you never know...

 

import inspect

from pycelonis.data_deduplication.duplicate_checker import DuplicateChecker

print(inspect.getsource(DuplicateChecker))

 

All the best,

Sasa


Reply