One Transforming job consisting of x steps: no commit?

Hi to all,

if you create a transformation job like this:

INSERT INTO _CEL_P2P_ACTIVITIES (…) SELECT * FROM … – Sorting = 10
;

INSERT INTO _CEL_P2P_ACTIVITIES (…) SELECT * FROM … – Sorting = 20
;

INSERT INTO _CEL_P2P_ACTIVITIES (…) SELECT * FROM … – Sorting = 30
;

INSERT INTO _CEL_P2P_ACTIVITIES (…) SELECT * FROM … – Sorting = 100
;

The table _ CEL_P2P _ACTIVITIES will only be inserted completely AFTER FINISHING the whole transforming job.

There is no way to measure the progress during the running transforming job, e.g. by

select _SORTING,count(*) from _CEL_P2P_ACTIVITIES group by _SORTING;

If you are not going to seperate the transforming job into several single transforming steps, is this behaviour correct?

Is there no way to commit every single step ?

Thank you in advance

Hi there,

You can execute transformations in parallel rather than sequentially – where they can potentially block the execution of other transformations.

To achieve that, split a data job in different data jobs. That can start with assigning groups of transformations from one data job to separate ones (n transformations : 1 data job) and end with isolating single statements of a transformation in a separate data job (1 transformation : n data jobs).
Successively increasing the granularity like that and tracking data job execution performance while doing so allows for detecting and further isolating problematic transformations/statements step by step.

After running a query, you can then either check the seperate run time via the Event Collection logs or you can run the following statement. The result shows you the execution time in seconds for every query that started within the time frame you specified in the where condition.

SELECT

DATE_TRUNC( 'second' ,query_start:: TIMESTAMP ) as query_start,

session_id ,

transaction_id,

statement_id,

node_name,

LEFT (query,100),

ROUND((query_duration_us/1000000):: NUMERIC (10,3),3) duration_sec

FROM query_profiles

WHERE query_start BETWEEN '2020-01-01 01:00:00' AND '2020-01-09 13:00:00'

ORDER BY duration_sec DESC ;

Best regards,
Celonis Data Science Team

Hi,

thanks for this helpful query, I will test it. But within a single transformation job consisting of serveral sequential steps: Will there be a commit after each sequential job? Could you select * from _CEL_P2P_ACTIVITIES where sorting = 10 during (!) the second insert (sorting 20) ?

Thank you in advance