Skip to main content

I need to review in detail a ETL process in Celonis. I would like to at least download to my desktop all the scripts. And if possible also to download all the tables definitions (including raw data, temporary, views, and data model)

 

Is that possible? TIA

Have a look at Pycelonis:

https://celonis.github.io/pycelonis/1.6.1/reference/celonis_api/event_collection/data_pool/#celonis_api.event_collection.data_pool.Pool

 

You can get Data Model, Data Pool and Data Job data via the API.

If you are not familiar with the API, it also has tutorials:

https://celonis.github.io/pycelonis/1.6.1/getting_started/installation/

 

Best Regards,

Paul Velthuis


Thx! I had looked into Pycelonis a bit before, but I got the wrong idea it was only against the data model ... now I see I can do anything in extractions... 👍


You can also use content-cli program in Linux Terminal, inside Machine Learning workbench APP, to get data pool configuration. Unfortunately, data is only available using Python (Parquete files, or exporting data from table to Pandas DataFrame object, saving that into Linux Machine, and downloading into computer).

 

More info: <your-celonis-link>/help/display/CIBC/Content-CLI+as+a+content+management+tool+in+the+EMS


Thx Mateusz.

 

Anyway I got a nice weekend fighting with python and PyCelonis and I got how to do it with the Api, not the content-cli.

Basically you create a temporary data job

try:

    tempname="TemporaryPython"+datetime.datetime.now().strftime('%f')

    data_job=pool.create_data_job(tempname)

  except:

    print("DATAJOB CRITICAL: Can't create temporary datajob %s in pool %s" %(tempname,pool_name))

    sys.exit()

, then a temporary transformation with the SQL statement you want to run,

 

try:

    tempscript=data_job.create_transformation('tempPythonScript',

    statement=sql_statement,

    description='Temporary Job from python')

  except:

    Output.status='1'

    Output.tests = dict()

    return(Output())

 

 

then you run it with get_data_frame() -   

  df=tempscript.get_data_frame()

 

(now teaching myself how to deal with pandas' data frames... 😃 )


Reply