Normalize calculation result in table based on another value?

Dear fellow process miners,
Im currently struggling implementing a normalization of a result in a table. It seems to be impossible to achieve that an OLAP table in Celonis.
Heres my problem:
Every case has a list of 0 to N events that affect the duration of the case. Its a simple parent-child relationship in the data model. If no event happened to a case, there will be no record in the child table.
Now I created a simple OLAP table with the event category (event.category) as dimension.
My two KPI columns so far are

The number of events of that category as COUNT(DISTINCT event.key)
The sum of delay in the process, caused by that event as SUM(event.delay).

Now, I want to normalize the delay to 100 total cases. Say, I have 15 events, causing 15 minutes of delay. In total, I have 250 cases. What Im trying to achieve is the following simple formula:
(SUM(delay)/Total number of cases)*100, which should be (15/250)*100 = 6.
However, this seems to be impossible to achieve in Celonis as

GLOBAL() will only give me the number of cases shown in the table, which will exclude the 235 cases having no events (As no column of the case table is involved as dimension)
A static variable will not be filtered with selections
A PU-function will not work, because it will also not include the 235 cases without an event.

At the moment, I dont see a chance to achieve such calculation due to the above limitations.
Did any fellow process miner stumble upon such thing and find a solution to it? The cherry on the cake would be to apply a filtering on the number of cases (e.g. an attribute on the case table) for the normalization (e.g. with a case when statement)
Any help would be greatly appreciated.
Thank you
Best
Max

Page 1 / 1

Hello Max,
Welcome to the Celonis Community

Thank you for the detailed description of your problem and what you have already tried to do to solve it. Unfortunately, I wasnt able to find a satisfactory solution for you, but I did reach out to my colleagues within Celonis and we are discussing ways to help you.
Well get back to you as soon as weve come up with something.
Best wishes,
Calandra

Hello Max,
have you tried using GLOBAL with COUNT_TABLE to get the overall number of cases?
The query part would look like this:
GLOBAL ( COUNT_TABLE ( "CaseTable" ) )
Please let us know if this works.
Cheers
David

Hi David,
yes, I have tried that, as mentioned above. However, this will not work because the table does not contain all selected cases. E.g.: My selection includes 250 cases, of which only 150 count into the table (because of the 0N relation). Then GLOBAL(COUNT_TABLE(xx)) will only return 150 as a result.
This is also stated in the official Celonis docs: https://help.celonis.de/display/PQL45/GLOBAL
Best
Max

Hi Max,
now Im a little bit confused. So do you want selections and filters to be taken into account or not? In the latter case, a static variable should do the job.
Cheers
David

Hi David,
Selections and Filters must be taken into account. Thus, a static variable will not do the trick.
However, GLOBAL will also NOT do the trick, as GLOBAL will only aggregate over the values of a column within a table:

With a global aggregation it is possible to calculate an additional aggregation function over all values of a column.

(From the official docs).
That, however, is not necessarily the number of selected/filtered cases, but the number of cases of that particular table. This number might or might not be lower (since potentially not all cases are included by the dimensions. Cases without an entry in the event table are not captured and thus not part of the GLOBAL function.
It would be great if Celonis would offer an option for the static variables that tells it to take selections into account.
Thank you
Best
Max

Hi Max,
GLOBAL will aggregate all the values of the column you specify inside the aggregation, taking all filters and selections into account. So when you have
GLOBAL ( COUNT_TABLE ( "CaseTable" ) )
Instead, when you do
GLOBAL ( COUNT ( DISTINCT "ActivityTable"."CaseId" ) )
the number might be different because not all cases are contained in the activity table.
Now it would be interesting what kinds of filters you are using. If the filter only contains the case table, GLOBAL should work. If you filter on things of the activity table, the join is calculated between the case and activity table and therefore, the case table rows that dont have a join partner are not part of the GLOBAL aggregation anymore. But this is desired, because when you filter on a specific category in the activity table for example, all cases without an activity also dont have this specific category, and therefore they shouldnt be considered in the GLOBAL. If you filter on full cases, use the case table to filter on them rather than the activity table.
Best
David

After checking the exact query directly together with our Process Mining Consultants, we found the reason for the behavior, which Id like to share here:
Its basically because of what I assumed in my previous post. Lets take a look at the following example:
Activities

Case
Activity
Timestamp

1
A
08.11 08:00

1
B
08.11 08:10

1
C
08.11 08:20

2
A
08.11 08:30

Cases

Case
Category
Delay

1
C1
100

2
C1
200

3
C2
600

In the Case-table, there are three cases of different categories with a delay number, but in the Activity-table, we only have events corresponding to two cases.
In an OLAP table, we now want to calculate for each category the sum of he corresponding delays and divide that by the total number of cases.
The query would be:
Dimension: "Cases"."Category"
KPI: SUM("Cases"."Delay") / GLOBAL ( COUNT_TABLE ("Cases") )
This works and returns the following result:

Category
Delay share

C1
100

C2
200

Until now, everything is as expected. C1 has value 100 because (100+200)/3 = 100 and C2 has value 200 because 600/3 = 200.
Now lets assume there is an active Analysis filter or selection which filters out an activity:
FILTER "Activities"."Activity" != 'A';
Our activity table then looks like this:

Case
Activity
Timestamp

1
A
~~08.11 08:00~~

1
B
08.11 08:10

1
C
08.11 08:20

2
A
~~08.11 08:30~~

As this is a column of a table which is not in the table we use in our actual query, a join is performed between the two tables.
So there is a left outer join between the Activities table (left / N side) and the Case table (right / 1 side), the Case table looks like this after the join with the filtered Activity table:

Case
Category
Delay

1
C1
100

2
C1
~~200~~

3
C2
~~600~~

In the example, only case 1 from the Activity table has a join partner in the Case table (case 2 of the original Activity table is filtered out completely, and case 3 was never present in the Activity table).
So our example query with the filter would produce this result:

Category
Delay share

C1
100

The result is 100 because GLOBAL also respects the filtering, thus it returns 1 instead of 3 in the example.
This is a desired behavior, because if cases are filtered out, it should be respected inside the GLOBAL as well.
You can either use a static variable which does not take filters and selections into account(except Analysis filter), or you use GLOBAL which takes all filters into account. It is not possible to apply a filter to only specific parts of a query. It is possible to filter on the values that are aggregated by setting the values that shouldnt be counted/summed up to NULL with a CASE WHEN statement and thereby not using a filter statement. This however doesnt reduce the number of rows in the result table.
We are happy to hear your suggestions on how to improve the situation. You say that filters should be taken into account in specific situations / in certain parts of the query, but they shouldnt be taken into account in other situations / in other parts of the same query. Therefore its hard for me to generalize the problem, so Im happy to hear wishes or feature proposals from your side.
Best
David

Interesting read, thanks for the detailed explanation of the functionality. I actually agree with the behavior you describe, @d.becher.

maximilian.hoffmann:

If no event happened to a case, there will be no record in the child table.

I think this design decision for your eventlog is the root issue. From my perspective, every case should have at least one activity in the eventlog - even if its just a Create Case activity and the Delay value for that activity is 0. Then that activity could be included as a dimension in your OLAP table and ALL cases would be counted without adding additional Delays to your calculation.
-Tyler

Thank you @d.becher for that detailled explanation.
Although I now fully understand that GLOBAL will take all Filters into account (which by the way kind of contradicts the name global), it might not always be useful - as seen in this case.
I escaped that problem by removing the filter, clustering all unwanted cases into one row and highlighting it visually. Not exactly the cleanest solution, but the normalization works now.
Looking at the topic from a higher level, one great feature would be to access KPIs in an escaped way.
Say you define a Saved Formula Count Cases where attribute x = y: It would be great to use that formula in a table, returning the same result as if I would use it in a single-KPI Component. That could be a case count or a ratio or anything taking the currently selected (!= filtered) dataset into account.
Being able to e.g. say "ESCAPE(KPI(myKPI)) would be a great addition to PQL, allowing you to use results that are usually already shown in Single-KPIs.
That would be pretty neat and a consistent way to put things into perspective.
@MN_Process_Mining: I think there might be a bit of a confusion here. Events in this case does not mean Activities. Theres actually no case having no entries in the activities table. The event table in my case is a separate table.
I marked @d.bechers reply as solution. Thank you for your support.

Hi @maximilian.hoffmann,
sorry for the question again, but wouldnt this be exactly what you could achieve with a static variable? A static variable will not take filters into account, GLOBAL will do it (Only one exception, the Analysis Filter is also respected in a variable).
Thanks for clarifying this
David

Hi Daniel. But Static variables do not take Selections into account, or do they?

No, they dont. For every query, Selections and Analysis/Sheet/Component filters are handled the same from a PQL perspective, so distinguishing between both kinds of filters is not done anywhere so far. But the behavior is the same with selections, if you filter by clicking on an activity name in an olap table, the result will be the same as described in the example above.
David

Reply

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded