Skip to main content

Hi @all,

 

We have a Data Pool with 2 Data Models: Data Model “ECI Data Model HTD - TABLES - TEST” has been duplicated from the Data Model “ECI Data Model HTD - TABLES”.

 

If we do a “Manual - Complete Reload” of both Data Pools, the load times are very differents:

 

  • “ECI Data Model HTD – TABLES” --> 34 min

Image1 

  • “ECI Data Model HTD – TABLES – TEST” --> 21 min

Image2 

 

Apparently, the load time of each of the tables is very similar:

Table1 

 

If we do a "Manual - Partial Reload" on both Data Pools of a table that only contains 87 records, the load times are also very differents:

 

  • “ECI Data Model HTD – TABLES” --> 19 min

Image3 

  • “ECI Data Model HTD – TABLES – TEST” --> 3 min

Image4 

What could be due to these differences? As we said before, the Data Model that has the best times has been duplicated from the previous one.

 

Thanks in advance!

Hi Edurado ,

Just trying to understand more - what are different in those two data models ?

Are they referring to the same connection ( I believe Yes ).

First load takes more time or the second load ?

 

 

Regards

Ayan

 

 


Hi Ayan,

 

Thank you very much for your answer!

 

There is no difference between the data Models, they belong to the same Data Pool. One Data Model has been duplicated from the other.

 

The tables and data of both Data Pools are exactly the same.

 

Complete loads of the original Data Pool take approximately 34 min. Partial loads of the original Data Pool take approximately 19 min.

 

Complete loads of the duplicated Data Pool take approximately 21 min (13 min less than the original Data Pool). Partial loads of the duplicated Data Pool take approximately 3 min (16 min less than the original Data Pool).

 

I hope you can understand it better

 

Thxs


Hi Ayan,

 

Thank you very much for your answer!

 

There is no difference between the data Models, they belong to the same Data Pool. One Data Model has been duplicated from the other.

 

The tables and data of both Data Pools are exactly the same.

 

Complete loads of the original Data Pool take approximately 34 min. Partial loads of the original Data Pool take approximately 19 min.

 

Complete loads of the duplicated Data Pool take approximately 21 min (13 min less than the original Data Pool). Partial loads of the duplicated Data Pool take approximately 3 min (16 min less than the original Data Pool).

 

I hope you can understand it better

 

Thxs

Hi Eduardo,

Thank you for the detailed response. This is a strange behavior.

Kindly let me know .. when both the model is run Under same option ( Complete or Partial ) which model takes more time , the one run first or second.

 

I am just trying to have a wild guess , may be due to the run of First model might load data into memory and that may slow down the run of the second data model when both are run under same option ( Complete or partial ).

 

Parallelly suggest you to raise a ticket with Celonis Helpdesk.

 

Regards

Ayan

 


Hi Ayan,

 

Yes, it's a strange behavior.

 

I detail step by step:

 

1. I have a Data Pool with 1 Data Model "A".

2. When I run a Complete load of Data Model "A", it takes 34 min.

3. When I run a Partial load of a table that only has 87 records in Data Model "A", it takes 21 min.

 

4. Now I duplicate the Data Model "A" with the "Duplicate" option, we will call it Data Pool "B":

Image55. When I run a Complete load of Data Model "B", it takes 19 min.

6. When I run a Partial load of a table that only has 87 records in Data Model "B", it takes 3 min.

 

NOTE: It does not matter if I do the loads (Complete or Partial) before Data Pool A or B, the times are those that I have commented previously.

 

As you can see, when duplicating the Data Models, the same tables and the same data are obtained, why this difference in performance?

 

I will follow your recommendation and open a ticket to Celonis Helpdesk.

 

Thank you very much.

 

Regards


Hi Ayan,

 

Yes, it's a strange behavior.

 

I detail step by step:

 

1. I have a Data Pool with 1 Data Model "A".

2. When I run a Complete load of Data Model "A", it takes 34 min.

3. When I run a Partial load of a table that only has 87 records in Data Model "A", it takes 21 min.

 

4. Now I duplicate the Data Model "A" with the "Duplicate" option, we will call it Data Pool "B":

Image55. When I run a Complete load of Data Model "B", it takes 19 min.

6. When I run a Partial load of a table that only has 87 records in Data Model "B", it takes 3 min.

 

NOTE: It does not matter if I do the loads (Complete or Partial) before Data Pool A or B, the times are those that I have commented previously.

 

As you can see, when duplicating the Data Models, the same tables and the same data are obtained, why this difference in performance?

 

I will follow your recommendation and open a ticket to Celonis Helpdesk.

 

Thank you very much.

 

Regards

Hi Eduardo,

 

Is there a difference if your reload them in reverse order? i.e. can you try reloading DM B first, and the DM A and compare the load times? Are you using views instead of tables in your transformations by any chance?

 

My theory is that if you're using views the first reload creates a temp table from a view, while the second reload skips that step and just uses the temp table that hasn't been dropped yet. If that is correct, the initial load will be longer (regardless if it's DM A or 😎, and the second load quicker.


Hi Eduardo,

 

Is there a difference if your reload them in reverse order? i.e. can you try reloading DM B first, and the DM A and compare the load times? Are you using views instead of tables in your transformations by any chance?

 

My theory is that if you're using views the first reload creates a temp table from a view, while the second reload skips that step and just uses the temp table that hasn't been dropped yet. If that is correct, the initial load will be longer (regardless if it's DM A or 😎, and the second load quicker.

Hi Eugene,

 

Thank you very much for your answer.

 

I already have an answer from Celonis. I share it with all of you 😊

 

"The difference in load times can be explained by the PQL engine query warmup phase which is part of every complete or partial data model load.

 

During the PQL warmup phase, Celonis is pre-executing the most used long-running PQL queries that have been triggered by users for that data model.

 

When you copy a data model then the query warmup phase won’t do anything because no queries were triggered yet for the data model. Only if you would start using the data model in analysis and view the warmup phase would increase in time.

 

You would see an increase in the load time for duplicated data model if you assign it to an analysis/knowledge AND actively use it by triggering PQL queries.

The more long-running PQL queries have been executed, the more you would see an increase in the data model load times for the duplicated data model.

 

If you never use the duplicated data model, the query warmup won’t do anything."

 

Thank you all!!


Hi Eduardo ,

Thank you so much for sharing the solution explanation. I really appreciate your kind gesture.

 

Regards

Ayan


Reply