How to convert table from the datapool into pandas dataframe?

Hi Team,

Im trying to convert the data i need from the datapool into a pandas dataframe for further calculation.

In the documentation i could only find how to read the parquet files and convert into Pandas.

Is there any way?

Let me know

Thanks

Page 1 / 1

Hello,

We pull data from a the data model in a data pool. The following code works in pycelonis 2.0 or higher. The output is a pandas data frame, the input is a data model object with the table that you want to export and the tablename.

Hint: You have to execute a query over the data model to extract (export) data, so we are dinamically creating the SQL with a "for" statement. This helps in case we change the table definition so we don't need to change the script.

def getDataFrameFromDataModel(dataModel, tableName):

dataPoolTable = dataModel.get_tables().find(tableName)

from pycelonis.pql import PQL, PQLColumn, PQLFilter

query = PQL()

cel_name = ""

for colx in dataPoolTable.get_columns():

col = colx.name

name = f'{col}'

cel_name = f'"{tableName}"."{col}"'

query += PQLColumn(query=cel_name , name=name)

# print(query)

data_frame = dataModel.export_data_frame(query)

data_framea'source'] = tableName

# print(data_frame.head())

return data_frame

We are using a global constant for the data model name. The code below pulls the data model:

dataModelName = DATA_MODEL

dataModel = dataPool.get_data_models().find(dataModelName)

This is the sample in API documentation that we use as base for the code above:

https://celonis.github.io/pycelonis/2.0.1/tutorials/executed/02_data_integration/03_data_pull/#33-pqlfilter-objects

Regards, Esteban.

Hi @esteban.leiva12 ,

Thank you for your reply.

I tried using this, but i want to get the data from the connections/files.

For ex, i have uploaded an excel via the upload files option into the connection.

I need to access the data from that excel as dataframes in ML, is there any such option?

Please let me know.

Thank you

Hi @Rahul Bhat , I don't really know. It seems like trying to get data from the data source.

The file that you uploaded created a table in the data pool. You can create a dummy data model with that table created with the excel file. This would work.

I tried to pull data from tables in the data pool but I didn't find any option for that. As far as I know, the only way to extract data is via query (export) to the data model.

Hi @Rahul Bhat , I don't really know. It seems like trying to get data from the data source.

The file that you uploaded created a table in the data pool. You can create a dummy data model with that table created with the excel file. This would work.

I tried to pull data from tables in the data pool but I didn't find any option for that. As far as I know, the only way to extract data is via query (export) to the data model.

Hi @Rahul Bhat & @esteban.leiva12,

This is indeed the way to go. Since the PyCelonis API v.2+, only data from data models can be retrieved, previously this was also possible from transformations, but only for a limited number of rows.

Best regards,

Jan-peter

Hi @Rahul Bhat & @esteban.leiva12,

This is indeed the way to go. Since the PyCelonis API v.2+, only data from data models can be retrieved, previously this was also possible from transformations, but only for a limited number of rows.

Best regards,

Jan-peter

Hi @janpeter.van.d @esteban.leiva12

Thank you for your comments.

Yes i was under the impression that the data could still be taken directly from the datapool, i was not aware we cant do that any more.

I will try taking data from datamodel and then insert it back to the table in the datamodel.

@esteban.leiva12 The code is still a bit confusing for me and its giving out a few errors. Il try solving it and if not i might drop another comment for your help.

Thank you

Found the answer

columns=y.get_columns()

from pycelonis.pql import PQL,PQLColumn,PQLFilter,OrderByColumn

query= PQL()

for column in columns:

query.add(PQLColumn(name=column.name,query=f""" "{y.name}"."{column.name}" """))

df=h.export_data_frame(query)

Found the answer

columns=y.get_columns()

from pycelonis.pql import PQL,PQLColumn,PQLFilter,OrderByColumn

query= PQL()

for column in columns:

query.add(PQLColumn(name=column.name,query=f""" "{y.name}"."{column.name}" """))

df=h.export_data_frame(query)

What is 'h' in last line(df=h.export_data_frame(query)) ?

Reply

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded