Skip to main content

Hi all,

 

I‘m desperately looking for a way to start a data job directly via a Python statement within the ML Workbench (IBC). 

I succeeded in starting a data model load via the api_request() function providing the reapective pool and model ID‘s but couldn‘t implement a similar logic for a data job execution on the ML Workbench. 

Could you help me out with the correct function and schema on how to do the call?

 

Thank you very much for your help!

 

BR

 

David

Hello,

 

It is possible to trigger data job by pycelonis (I checked 1.5.6 latest version).

You can see the documentation https://celonis.github.io/pycelonis/reference/pycelonis.celonis_api.event_collection.data_job.html#pycelonis.celonis_api.event_collection.data_job.DataJob.execute


Hi Kazuhiko,

 

thanks for your quck reply! I‘ve come accross this class/function beforehand but somehow I didn‘t/couldn‘t initiate the right call.

Could you thankfully post schematically what the correct way to start a data job in pycelonis would be? Some sort of example code could help a lot to make it loud and clear.

Maybe you could also help me out on what the „parent“ in the datajob class is meant to be.

Thank you very much for your help and patience and sorry for the rather „noob“ question!

 

BR

Franz


Hello David,

 

Please use below snippet.

 

import pycelonis 

celonis = get_celonis()

pool = celonis.pools.find('your pool name')

datajob = pool.data_jobs.find('your job name')

datajob.execute()

 


Hi Kazuhiko

 

thanks a lot for your quick reply and help!

Once I go through the code on data job level as you suggested I get an error code 500 but going one level further to specify and execute a specific transformation within the very same data job works just nicely. 

My overall goal is to parallelize 5 data jobs/transformations all at the same time and after the last of those 5 has finished come back to a sequential execution of the remaining data jobs. 

At best I‘d simply trigger one scheduling task from the ML Workbench where data jobs are then executed one after another and thus more reliable. However this does not seem possible. How could I come back to a normal and automatic sequential data job processing after having startet transformations within ML Workbench. 

 

Thank you very much for your help!

 

BR

 

David


Hello David,

 

Just now I am going to investigate handling parallel job execution. I will be back when something goes well.


Hi,

 

as I mentioned I couldn‘t run the execute statement on data job level but have a working solution on task level. 

The code 

 

import pycelonis 

celonis = get_celonis()

pool = celonis.pools.find('your pool name')

datajob = pool.data_jobs.find('your job name')

 

list = datajob.transformation 

 

provides a list of all tasks within a data job (also deactivated ones). 

One can iterate through this list within a for loop and place the execute statements there. 

However, in an nonparallel environment and thus when there are multiple tasks within a datajob the execute statement on transformation level is like manually running a data job with only one task selected. 

After each transformation execution within the for loop a while loop checks if the datajob was performed successfully and then iterates to the next task in the for loop. 

 

Regarding parallel execution the procedure is easier yet with the limitation of only having one task in each of those data jobs that need to be parallelized. One simply executes the different tasks one by one and a checking loop takes care that only when all parallelized tasks have been completed, the next data job is initiated.  

 

I‘m still trying to find a way on how to execute a data job in full with only a single statement but until that I do now have a working solution for my initial issue. 

 

Thank you very much and let‘s stay in touch should you find a more elegant/stable way of handling parallelization. 

 

Thanks and BR

 

David


Quick Update to the initial issue: After a bug report, the .execute function is now working as expected in the most recent pycelonis beta.

BR

David


Hi David, I am also trying to run a data job from the machine learning workbench. However, I get a 500 internal server error whenever I try to run execute(). Do you have any idea why?

 

Thanks a lot!

 

Maxine


Hi Maxine,

 

hope I can help!

For our testing purposes we were advised to use the 1.5.9dev3 version of pycelonis. I expect the current productive releases to have incorporated the mentioned fix, if not your way forward may look like the following:

celonis=get_celonis()

randomvariable = celonis.api_request('https:///team URL]/integration/api/v1/data-pools//data pool ID/data-jobs//data job ID]/execute?mode=FULL', method ='POST')

 

Hope I could help! :)

 

Best 

David


Hi Maxine,

 

hope I can help!

For our testing purposes we were advised to use the 1.5.9dev3 version of pycelonis. I expect the current productive releases to have incorporated the mentioned fix, if not your way forward may look like the following:

celonis=get_celonis()

randomvariable = celonis.api_request('https:///team URL]/integration/api/v1/data-pools//data pool ID/data-jobs//data job ID]/execute?mode=FULL', method ='POST')

 

Hope I could help! :)

 

Best 

David

Hi David,

 

Thank you so much for your reply!

 

Unfortunately, I get a new error.... which seems to be due to the fact that I made the http request too many times (i had a typ before) ? should I just wait ?

ConnectionError: HTTPSConnectionPool(host=xxxxxxxxxx): Max retries exceeded with url: /integration/api/v1/data-pools/pmy data pool id ]/data-jobs/-my data job id]/execute?mode=FULL (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f0a6c4f7100>: Failed to establish a new connection: tErrno -2] Name or service not known'))

 


Hi Maxine,

 

sounds not in my power to help you here. You could nevertheless focus on the .execute() function in the short term.

Good luck!

David


Reply