Skip to main content
Solved

Hey everyone, do you guys know if it is possible to do a delta load for tables i want to retrieve on the ML Workbench using Pycelonis?

  • March 7, 2023
  • 5 replies
  • 17 views

Background info: I have a data transformation script called "Preprocessing'. In that Script, some SQL operations are done. Afterwards, i retrieve the resulting tables in the ml workbench to execute statistical python operations. Then, the result is uploaded back to Celonis where another data trasnformation script called 'postprocessing' enriches the data for visualization. Currently, the python part takes 4 hours to execute as i execute levenhstein ratio for a very large dataset. So my idea is to use delta loads in order to reduce the amount of operations. Thanks for your help!

Best answer by rio.cinco12

Not sure what you meant by AFTER python operation. You can use the delta upload anytime you want.

 

Other alternative is using the data push api : https://docs.celonis.com/en/data-push-api.html

5 replies

Forum|alt.badge.img+14
  • Level 8
  • 57 replies
  • March 7, 2023

Hi, what version of pycelonis you're using ?

 

For v2.0.1 : you can use below :

data_push_job = data_pool.create_data_push_job(target_name="ACTIVITIES", type_=JobType.DELTA)

data_push_job.add_data_frame(df)

data_push_job.execute(wait=True)

 

https://celonis.github.io/pycelonis/2.0.1/tutorials/executed/02_data_integration/05_data_push_pull_advanced/#341-chunking-for-parquet-files


  • Author
  • Level 1
  • 2 replies
  • March 7, 2023

Hi Rio, thanks for your fast response!

 

We indeed use V.2.0.1, however the snippet you provided is meant for the upload AFTER the python operations right? Im rather looking for something where i can do a delta load, BEFORE the python operations are taking place. So when i retrieve my table using PQL Query.

 

As an alternative, i think the native way would be to restrict the whole data retrieval from the beginning on, right?

 

Thank you and best regards,


Forum|alt.badge.img+14
  • Level 8
  • 57 replies
  • Answer
  • March 7, 2023

Not sure what you meant by AFTER python operation. You can use the delta upload anytime you want.

 

Other alternative is using the data push api : https://docs.celonis.com/en/data-push-api.html


carmem.caval
Level 4
Forum|alt.badge.img+6
  • Level 4
  • 7 replies
  • June 25, 2024

Hi Peter, I know it's been a while since this question, but I'm facing a similar problem. Did the data-push API resolve it for you, or did you find another workaround?


  • Author
  • Level 1
  • 2 replies
  • June 26, 2024

Hey Carnem,

 

Tbh i do not remember anymore... I think I solved it by splitting the SQL script into smaller ones, some with Delta Loads and then used the Data Push API.

 

Hope this helps..