Background info: I have a data transformation script called "Preprocessing'. In that Script, some SQL operations are done. Afterwards, i retrieve the resulting tables in the ml workbench to execute statistical python operations. Then, the result is uploaded back to Celonis where another data trasnformation script called 'postprocessing' enriches the data for visualization. Currently, the python part takes 4 hours to execute as i execute levenhstein ratio for a very large dataset. So my idea is to use delta loads in order to reduce the amount of operations. Thanks for your help!
Hi, what version of pycelonis you're using ?
For v2.0.1 : you can use below :
data_push_job = data_pool.create_data_push_job(target_name="ACTIVITIES", type_=JobType.DELTA)
data_push_job.add_data_frame(df)
data_push_job.execute(wait=True)
Hi Rio, thanks for your fast response!
We indeed use V.2.0.1, however the snippet you provided is meant for the upload AFTER the python operations right? Im rather looking for something where i can do a delta load, BEFORE the python operations are taking place. So when i retrieve my table using PQL Query.
As an alternative, i think the native way would be to restrict the whole data retrieval from the beginning on, right?
Thank you and best regards,
Not sure what you meant by AFTER python operation. You can use the delta upload anytime you want.
Other alternative is using the data push api : https://docs.celonis.com/en/data-push-api.html
Hi Peter, I know it's been a while since this question, but I'm facing a similar problem. Did the data-push API resolve it for you, or did you find another workaround?
Hey Carnem,
Tbh i do not remember anymore... I think I solved it by splitting the SQL script into smaller ones, some with Delta Loads and then used the Data Push API.
Hope this helps..
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.