In order to save APC/Space in celonis i am looking for any standard tables which can give list of columns and the size of the column by table and schema. Also if there is a way to find out if it is used in any transformation that will be good to say it is not used anywhere and it can be dropped from the schema and table.

how can we find unused columns for a table in any transformation. Trying to understand the size of that table column and see if we can drop the unused columns to save space.

+8

manuel.wetze
Level 9
Forum|Forum|1 year ago
January 29, 2024

Highly intested as well on the column granularity.

I do use the Marketplace App "Monitoring" from Celonis to at least track and drill down APC on Table level. But this does not solve your question regarding APC per column or even which ones are unused.

Like

A

akshay.harid
Level 7
Forum|Forum|1 year ago
January 29, 2024

Hi Muthappan Alagappan,

From my perspective finding out the list of columns which has not been used in transformation can be done through ML workbench using python, but that is a complicated one. I cannot share the code that I am using but I could definitely give an idea on how it can be done:

1) Using pycelonis find out your dj in which your transformation script is.

2) Using regular expression module in python find out a pattern using which you can list out table and column names.

3)Extract your transformation statement from dj using python and then apply the regex pattern in order to retrieve table and column names.

4)Next using python access your extraction datapool , from the dp find out list of tables/columns that have been extracted , then compare the list of tables there with the earlier list you made from transformations and those tables/columns that are there in extraction dp but not in transformation dp are the one which have been extracted but not used.

5) using requests and json module in python you can also retrieve the size of the extracted tables by giving api requests and then merge this information with the table list.

Thanks,

Akshay

Like

+1

muthappan.alaga
Author
Level 8
Forum|Forum|1 year ago
January 29, 2024

Highly intested as well on the column granularity.

I do use the Marketplace App "Monitoring" from Celonis to at least track and drill down APC on Table level. But this does not solve your question regarding APC per column or even which ones are unused.

Thank you Manuel, yes i do use table level monitoring using celonis Monitoring view, but this doesnt go to the column level.

i am also trying to understand if there is any standard vertica tables gives the size at the column level, but no luck yet.

Like

+1

muthappan.alaga
Author
Level 8
Forum|Forum|1 year ago
January 29, 2024

Hi Muthappan Alagappan,

From my perspective finding out the list of columns which has not been used in transformation can be done through ML workbench using python, but that is a complicated one. I cannot share the code that I am using but I could definitely give an idea on how it can be done:

1) Using pycelonis find out your dj in which your transformation script is.

2) Using regular expression module in python find out a pattern using which you can list out table and column names.

3)Extract your transformation statement from dj using python and then apply the regex pattern in order to retrieve table and column names.

4)Next using python access your extraction datapool , from the dp find out list of tables/columns that have been extracted , then compare the list of tables there with the earlier list you made from transformations and those tables/columns that are there in extraction dp but not in transformation dp are the one which have been extracted but not used.

5) using requests and json module in python you can also retrieve the size of the extracted tables by giving api requests and then merge this information with the table list.

Thanks,

Akshay

Thank you Akshay, i understand your approach, but the challenge i have is same columns might be used at multiple tables and multiple data jobs.

But i got couple of ideas from your approach

From the data pool you can export the data pool in json file. Do a regular expressions to find where used and how many times it is used in that json file. Drawback, if a column is used in multiple tables then our count might be wrong.
i have a raw table and list of columns, i have transformed table and the list of columns in the model level, compare those two tables to see what is used, but drawback is if the columns are used just in the transformation but not used in any data model tables, i will miss the counts/columns.

Thank you

Muthappan

Like

matt.witty13
Level 10
Forum|Forum|1 year ago
January 29, 2024

There are system tables within SQL Vertica you can use.

SELECT *

FROM COLUMNS

This will show you every column and every table within your EMS instance. I've built out a number of data integrity tables using this one and "projections". Projections will tell you whether a table contains the full metadata to be able transform in its most efficient way.

Like

+1

muthappan.alaga
Author
Level 8
Forum|Forum|1 year ago
January 29, 2024

There are system tables within SQL Vertica you can use.

SELECT *

FROM COLUMNS

This will show you every column and every table within your EMS instance. I've built out a number of data integrity tables using this one and "projections". Projections will tell you whether a table contains the full metadata to be able transform in its most efficient way.

Hi Matt,

Thank you for that COLUMNS and PROJECTIONS table, i dont see any columns which reflect size of the column.

Thanks

Muthappan

Like

+26

janpeter.van.d
Level 12
Forum|Forum|1 year ago
January 29, 2024

Thank you Akshay, i understand your approach, but the challenge i have is same columns might be used at multiple tables and multiple data jobs.

But i got couple of ideas from your approach

From the data pool you can export the data pool in json file. Do a regular expressions to find where used and how many times it is used in that json file. Drawback, if a column is used in multiple tables then our count might be wrong.
i have a raw table and list of columns, i have transformed table and the list of columns in the model level, compare those two tables to see what is used, but drawback is if the columns are used just in the transformation but not used in any data model tables, i will miss the counts/columns.

Thank you

Muthappan

Hi @muthappan.alaga and @akshay.harid,

We created a script in Python with this approach, and used sql_metadata · PyPI to get the table names and columns from the SQL scripts.

One downside is that this method is not bullet-proof when using 'WITH'-statements and/or 'SELECT *' statements, since the column is then invisible, or is linked to the with-clausule and not the original table. If this is not applicable to you, then this a great way of creating insights in your Transformations.

Best regards,

Jan-peter

Like

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded