Skip to main content

Hi there,

I am looking for an efficient way to reduce the size of my data model:

I already reduced the time period of the ingested data. As a second step I want to get rid of some unused table columns.

To identify unused table columns I thought about doing the following:

Export a .txt documentation of each analysis, then search for a specific table (ctrl+f) (i.e. LIPS) and document which columns are really in use in this analysis. I do this with all analyses and then I know which columns I can exclude in the extractions because they are not used anywhere.

 

It's not as much manual work as it sounds but still a lot. Do you know a better way to do this?

How do you avoid such work in the future? Is there a standard procedure how to handle this problem or not run into it in the first place?

 

Thanks for your help

Daniel

 

We try to delete ingestion tables that are no longer used.

Also, we almost always views instead of tables to create the ones that appear in the process model (except if there are performance considerations)

 

AFAIK the APC cost is measured in Data Ingestion, including the process models tables.

 

HTH


Thanks.

The APC limit and the duration of the data load are exactly the reasons why we try to reduce the model. We work with views as well and already deleted whole tables which are not in use. I was hoping to find a more efficient way to review the tables we use and eliminate some of the columns.


Well, before that I'd try to reduce the size of the columns in the tables, it will be easier and, depending the number of records, going for example from VAR(50) to INT when you have numbers stored as strings, can reduce a lot the storage.

 

Said that, for what you want to do, the only feasible way will be to use python to iterate all the components of all the sheets of all the analyses, and inspect the PQL to extract which columns are being used.

Take in mind that most of the kpi's will be "hidden" in saved-formulas.

 

At the end it means to review the huge json struct that is the analysis.draft.data property

 

https://celonis.github.io/pycelonis/1.7.1/reference/celonis_api/process_analytics/analysis/

 

 


Thanks. I will try that.

Good thinking about the saved formulas. I totally missed that.


Hi @daniel.n,

 

The suggestion of @Guillermo Gost would indeed by my choice and we applied it before succesfully. Check the package sql-metadata (link: sql-metadata · PyPI), which will help you a lot!

 

Kind regards,

Jan-peter


Reply