Skip to main content
Hey Guys, Do you know how to join two pql dataframes in Celonis MLWB? The objects are of class: 'pycelonis.pql.data_frame.DataFrame'. I tried to convert them to pandas dataframes using to_pandas() but my kernel dies and restarts.

Hi Kunal! I tried to replicate your issue and noticed that my current partner instance is not enabled for Data Export. Running df.to_pandas() results in an error saying: "Data Export not enabled. Please contact Celonis customer support." My kernel doesn't die, though. You may need to contact support.


I would approach it simliar to you.

  1. converting it to pandas dataframe
  2. join it in pandas

However if I am not mistaken, the actual data export is being triggered at the moment when you are using the to_pandas() function. Obviously multiple things could go wrong at the export itself Access rights like in the case of Chris. An error message appears this certainly helps.

 

If the Kernel would simply die on me, I would first doublecheck the RAM that you allocated when creating the MLWB project and increase it, or at least closely monitor the consumption in the MLWB UI when executing the notebook.


Ya, I tried to convert the individual pql series to pandas series and saw that I was running out of RAM. I will need to get it upgraded to a higher number.

 

Is there an alternative to consume less memory and join pql data frames rather than pandas data frames?


Ya, I tried to convert the individual pql series to pandas series and saw that I was running out of RAM. I will need to get it upgraded to a higher number.

 

Is there an alternative to consume less memory and join pql data frames rather than pandas data frames?

I didn't see any other merge/join/append type functions native to SaolaPy in the API doc: data_frame - PyCelonis

 

Another thought would be to create a view or table in Vertica and add that to your data model. Data objects can exist in data models without having relationships, so you could add the new object simply for the sake of being able to manipulate it within ML Workbench. Definitely test this out for performance if your data model is huge.

 

As far as the out of memory issue goes, make sure to only bring in the columns and rows you need for Python-based work, or consider using a data sampling method to work with a smaller dataset.


Ya, I tried to convert the individual pql series to pandas series and saw that I was running out of RAM. I will need to get it upgraded to a higher number.

 

Is there an alternative to consume less memory and join pql data frames rather than pandas data frames?

agreed to chris. Only way left is to to the join in the Celonis EMS and not in the small container which runs the jupiter notebook.

alternative to creating somethin in vertica SQL you could also perform it in "Frontend" via PQL. E.g. using a Knowledge model and add PQL Columns step by step to attach them as you need them, or create an OLAP Table in an Analysis and extract the data you need from there.

 

And also I would like to emphasise what Chris said. Try to limit the data with filters or sampling to reduce RAM usage.


Reply