Skip to main content

Hi to all,

if you create a transformation job like this:

INSERT INTO _CEL_P2P_ACTIVITIES () SELECT * FROM Sorting = 10

;

INSERT INTO _CEL_P2P_ACTIVITIES () SELECT * FROM Sorting = 20

;

INSERT INTO _CEL_P2P_ACTIVITIES () SELECT * FROM Sorting = 30

;

INSERT INTO _CEL_P2P_ACTIVITIES () SELECT * FROM Sorting = 100

;

The table _ CEL_P2P _ACTIVITIES will only be inserted completely AFTER FINISHING the whole transforming job.

There is no way to measure the progress during the running transforming job, e.g. by

select _SORTING,count(*) from _CEL_P2P_ACTIVITIES group by _SORTING;

If you are not going to seperate the transforming job into several single transforming steps, is this behaviour correct?

Is there no way to commit every single step ?

Thank you in advance

Hi there,

You can execute transformations in parallel rather than sequentially where they can potentially block the execution of other transformations.

To achieve that, split a data job in different data jobs. That can start with assigning groups of transformations from one data job to separate ones (n transformations : 1 data job) and end with isolating single statements of a transformation in a separate data job (1 transformation : n data jobs).

Successively increasing the granularity like that and tracking data job execution performance while doing so allows for detecting and further isolating problematic transformations/statements step by step.

After running a query, you can then either check the seperate run time via the Event Collection logs or you can run the following statement. The result shows you the execution time in seconds for every query that started within the time frame you specified in the where condition.

SELECTDATE_TRUNC(

'second'

,query_start::

TIMESTAMP

)

as

query_start,session_id ,transaction_id,statement_id,node_name,LEFT

(query,100),ROUND((query_duration_us/1000000)::

NUMERIC

(10,3),3) duration_secFROM

query_profilesWHERE

query_start

BETWEEN

'2020-01-01 01:00:00'

AND

'2020-01-09 13:00:00'ORDER

BY

duration_sec

DESC

;

Best regards,

Celonis Data Science Team


Hi,

thanks for this helpful query, I will test it. But within a single transformation job consisting of serveral sequential steps: Will there be a commit after each sequential job? Could you select * from _CEL_P2P_ACTIVITIES where sorting = 10 during (!) the second insert (sorting 20) ?

Thank you in advance


Hi Hannes,

 

what do you mean by commit? If you want to select the data where sorting = 10 then your statement will be correct. The same with sorting = 20. You can then continue using this data for example to insert it into another table. In case this doesn't clarify your question please let me know and we can dive deeper.

 

Best

Kevin


Hi,

 

if you start a single data job consisting of several sequential following insert intos (usually in order to fill the ACTIVITIY-table) like:

 

INSERT INTO _CEL_P2P_ACTIVITIES () SELECT * FROM ... -- Sorting = 10;

INSERT INTO _CEL_P2P_ACTIVITIES () SELECT * FROM ... -- Sorting = 20;

INSERT INTO _CEL_P2P_ACTIVITIES () SELECT * FROM ... -- Sorting = 30;

INSERT INTO _CEL_P2P_ACTIVITIES () SELECT * FROM ... -- Sorting = 40;

INSERT INTO _CEL_P2P_ACTIVITIES () SELECT * FROM ... -- Sorting = 50;

...

 

If you repeatedly start then following command in parallel to the running data job above, you won't get intermediate results of every step/sorting. Only after finishing the complete data job, you will get the whole result about all Sorting-Nr.

 

select Sorting,count(*) from _CEL_P2P_ACTIVITIES group by Sorting order by 1;

 


Hi Hannes, okay good to know. Usually, we are splitting one activity creating into one data job. However, this does not mean that you need to do so as well. Is there a specific reason why you are putting everything into one statement?

 

Best

Kevin


Additon: Splitting Datajobs is also much better for readability as well as troubleshooting in case something is wrong with the statement. It is always easier to look in 30 lines of code instead of 300 🙂.

 

Best

Kevin


No, it is just a working philosophy 🙂 Splitting is to recommend, but I prefer all activity-codes in one sheet :-)


Sounds good. Whatever works best for you 😉. Would be interesting to know if there are performance differences, have you tested it with both versions and noticed a performance difference by any chance?

 

Have a good Sunday!

Best Kevin


Reply