Skip to main content

Hi Team,

Im trying to convert the data i need from the datapool into a pandas dataframe for further calculation.

In the documentation i could only find how to read the parquet files and convert into Pandas.

Is there any way?

Let me know

Thanks

 

Hello,

We pull data from a the data model in a data pool. The following code works in pycelonis 2.0 or higher. The output is a pandas data frame, the input is a data model object with the table that you want to export and the tablename.

 

Hint: You have to execute a query over the data model to extract (export) data, so we are dinamically creating the SQL with a "for" statement. This helps in case we change the table definition so we don't need to change the script.

 

def getDataFrameFromDataModel(dataModel, tableName):

 

  dataPoolTable = dataModel.get_tables().find(tableName)

  from pycelonis.pql import PQL, PQLColumn, PQLFilter

  query = PQL()

  cel_name = ""

  for colx in dataPoolTable.get_columns():

    col = colx.name

    name = f'{col}'

    cel_name = f'"{tableName}"."{col}"'

    query += PQLColumn(query=cel_name , name=name)

  # print(query)

  data_frame = dataModel.export_data_frame(query)

  data_framea'source'] = tableName

#   print(data_frame.head())

   

  return data_frame

 

We are using a global constant for the data model name. The code below pulls the data model:

 

  dataModelName = DATA_MODEL

  dataModel = dataPool.get_data_models().find(dataModelName)

 

This is the sample in API documentation that we use as base for the code above:

 

https://celonis.github.io/pycelonis/2.0.1/tutorials/executed/02_data_integration/03_data_pull/#33-pqlfilter-objects

 

Regards, Esteban.


Hi @esteban.leiva12 ,

Thank you for your reply.

I tried using this, but i want to get the data from the connections/files.

For ex, i have uploaded an excel via the upload files option into the connection.

I need to access the data from that excel as dataframes in ML, is there any such option?

Please let me know.

Thank you


Hi @Rahul Bhat , I don't really know. It seems like trying to get data from the data source.

The file that you uploaded created a table in the data pool. You can create a dummy data model with that table created with the excel file. This would work.

I tried to pull data from tables in the data pool but I didn't find any option for that. As far as I know, the only way to extract data is via query (export) to the data model.


Hi @Rahul Bhat , I don't really know. It seems like trying to get data from the data source.

The file that you uploaded created a table in the data pool. You can create a dummy data model with that table created with the excel file. This would work.

I tried to pull data from tables in the data pool but I didn't find any option for that. As far as I know, the only way to extract data is via query (export) to the data model.

Hi @Rahul Bhat & @esteban.leiva12,

 

This is indeed the way to go. Since the PyCelonis API v.2+, only data from data models can be retrieved, previously this was also possible from transformations, but only for a limited number of rows.

 

Best regards,

Jan-peter


Hi @Rahul Bhat & @esteban.leiva12,

 

This is indeed the way to go. Since the PyCelonis API v.2+, only data from data models can be retrieved, previously this was also possible from transformations, but only for a limited number of rows.

 

Best regards,

Jan-peter

Hi @janpeter.van.d @esteban.leiva12 

Thank you for your comments.

Yes i was under the impression that the data could still be taken directly from the datapool, i was not aware we cant do that any more.

I will try taking data from datamodel and then insert it back to the table in the datamodel.

@esteban.leiva12 The code is still a bit confusing for me and its giving out a few errors. Il try solving it and if not i might drop another comment for your help.

Thank you


Found the answer

 

columns=y.get_columns()

from pycelonis.pql import PQL,PQLColumn,PQLFilter,OrderByColumn

query= PQL()

for column in columns:

  query.add(PQLColumn(name=column.name,query=f""" "{y.name}"."{column.name}" """))

df=h.export_data_frame(query)


Found the answer

 

columns=y.get_columns()

from pycelonis.pql import PQL,PQLColumn,PQLFilter,OrderByColumn

query= PQL()

for column in columns:

  query.add(PQLColumn(name=column.name,query=f""" "{y.name}"."{column.name}" """))

df=h.export_data_frame(query)

What is 'h' in last line(df=h.export_data_frame(query)) ?


Reply