Changing SAP USNAM column from pseudonymized to non pseudonymized. Have anyone tried the approach mentioned below?

Hi,

During initial implementation there was a decision to pseudonymize column USNAM in multiple tables. Currently it was agreed that we need to have actual (not pseudonymized) users data in Celonis just like it is in SAP.

Usually unticking the Pseudonymize box in extraction setup would do the trick and I will do it for some smaller tables for which we have full loads. However we have couple tables for which we are doing only delta loads, as they are way to big for full loads (over 40 GB). Doing full load for them would take ages and I'm not entirely sure how Celonis would manage with such a huge load done at once.

The idea is to create one time mapping by downloading second iteration of that big table (BKPF is original one, the second one lets call it BKPF_RAW), that would only contain the 4 key columns and USNAM that is not pseudonymized. Next step would be to add new column (lets call it USNAM_RAW) to original BKPF, then update it with data from BKPF_RAW.USNAM by matching the two tables by unique keys. Next step would be deleting the original BKPF.USNAM and then renaming the BKPF.USNAM_RAW to USNAM in order to keep the name for all the transformations that we have. And also the last part would be to uncheck the Pseudonymize box in delta extractions, so that all the new data is consistent with the data to which we've changed or USNAM column.

Have any one tried this approach? It seems feasible at least to the point of replacing the data, but I'm wondering about any potential risk with data synchronization or any other issues that may occur later in transformations due to all of the shenanigans (keeping the same data type for BKPF.USNAM and BKPF.USNAM_RAW might be important). I'm already preparing data for testing in QA env but was wondering about your input.

Thanks,

Karol

Page 1 / 1

Hmmm.... I'd do a test first with a subset, as it could be that joining BKPF_RAW with orignal BKPF could take a while. And during that time you shouldn't stop updates on BKPF.

We had to do some updates over real time extracted tables, and it took several hours the update... and then more several hours to resynch the real time.

HTH

Hmmm.... I'd do a test first with a subset, as it could be that joining BKPF_RAW with orignal BKPF could take a while. And during that time you shouldn't stop updates on BKPF.

We had to do some updates over real time extracted tables, and it took several hours the update... and then more several hours to resynch the real time.

HTH

I did test join for a subset that was about 15% of our actual full data set and it took about 15 minutes. So hopefully the join for entire thing should take just shy of 2 hours, and as we are updating BKPF once a day it should not be an issue (the time window to do it will be quite big).

The resynch is what I'm worried about with regards to eventual errors like for example possibility of data type changing when USNAM is no longer pseudonymized, that would may cause a mismatch in further transformations and apps. Although, when I'm checking a small subset of BKPF.USNAM that is not hashed it looks like the data type is the same - VARCHAR(48) in both cases.

I did test join for a subset that was about 15% of our actual full data set and it took about 15 minutes. So hopefully the join for entire thing should take just shy of 2 hours, and as we are updating BKPF once a day it should not be an issue (the time window to do it will be quite big).

The resynch is what I'm worried about with regards to eventual errors like for example possibility of data type changing when USNAM is no longer pseudonymized, that would may cause a mismatch in further transformations and apps. Although, when I'm checking a small subset of BKPF.USNAM that is not hashed it looks like the data type is the same - VARCHAR(48) in both cases.

Correction - they both are VARCHAR but one is (160) and other (48)

I did test join for a subset that was about 15% of our actual full data set and it took about 15 minutes. So hopefully the join for entire thing should take just shy of 2 hours, and as we are updating BKPF once a day it should not be an issue (the time window to do it will be quite big).

The resynch is what I'm worried about with regards to eventual errors like for example possibility of data type changing when USNAM is no longer pseudonymized, that would may cause a mismatch in further transformations and apps. Although, when I'm checking a small subset of BKPF.USNAM that is not hashed it looks like the data type is the same - VARCHAR(48) in both cases.

Yep, I was expecting that but I wanted to check before. Because a string hashed with SHA-512 will create a 512 bits hash, and represented as hex chars will be 128 chars long

Reply

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded