Normalize calculation result in table based on another value?

Dear fellow process miners,

I’m currently struggling implementing a normalization of a result in a table. It seems to be impossible to achieve that an OLAP table in Celonis.

Here’s my problem:

Every case has a list of 0 to N events that affect the duration of the case. It’s a simple parent-child relationship in the data model. If no event happened to a case, there will be no record in the child table.

Now I created a simple OLAP table with the event category (event.category) as dimension.
My two KPI columns so far are

  • The number of events of that category as COUNT(DISTINCT event.key)
  • The sum of delay in the process, caused by that event as SUM(event.delay).

Now, I want to normalize the delay to 100 total cases. Say, I have 15 events, causing 15 minutes of delay. In total, I have 250 cases. What I’m trying to achieve is the following simple formula:
(SUM(delay)/Total number of cases)*100, which should be (15/250)*100 = 6.

However, this seems to be impossible to achieve in Celonis as

  • GLOBAL() will only give me the number of cases shown in the table, which will exclude the 235 cases having no events (As no column of the case table is involved as dimension)
  • A static variable will not be filtered with selections
  • A PU-function will not work, because it will also not include the 235 cases without an event.

At the moment, I don’t see a chance to achieve such calculation due to the above limitations.

Did any fellow process miner stumble upon such thing and find a solution to it? The cherry on the cake would be to apply a filtering on the number of cases (e.g. an attribute on the case table) for the normalization (e.g. with a case when statement)

Any help would be greatly appreciated.

Thank you

Best
Max

Hello Max,

Welcome to the Celonis Community :slight_smile:

Thank you for the detailed description of your problem and what you have already tried to do to solve it. Unfortunately, I wasn’t able to find a satisfactory solution for you, but I did reach out to my colleagues within Celonis and we are discussing ways to help you.

We’ll get back to you as soon as we’ve come up with something.

Best wishes,

Calandra

Hello Max,

have you tried using GLOBAL with COUNT_TABLE to get the overall number of cases?
The query part would look like this:
GLOBAL ( COUNT_TABLE ( "CaseTable" ) )

Please let us know if this works.

Cheers
David

Hi David,

yes, I have tried that, as mentioned above. However, this will not work because the table does not contain all selected cases. E.g.: My selection includes 250 cases, of which only 150 count into the table (because of the 0…N relation). Then GLOBAL(COUNT_TABLE(xx)) will only return 150 as a result.
This is also stated in the official Celonis docs: https://help.celonis.de/display/PQL45/GLOBAL

Best
Max

Hi Max,

now I’m a little bit confused. So do you want selections and filters to be taken into account or not? In the latter case, a static variable should do the job.

Cheers
David

Hi David,

Selections and Filters must be taken into account. Thus, a static variable will not do the trick.
However, GLOBAL will also NOT do the trick, as GLOBAL will only aggregate over the values of a column within a table:

With a global aggregation it is possible to calculate an additional aggregation function over all values of a column.

(From the official docs).

That, however, is not necessarily the number of selected/filtered cases, but the number of cases of that particular table. This number might or might not be lower (since potentially not all cases are included by the dimensions. Cases without an entry in the “event” table are not captured and thus not part of the GLOBAL function.

It would be great if Celonis would offer an option for the static variables that tells it to take selections into account.

Thank you

Best
Max

Hi Max,

GLOBAL will aggregate all the values of the column you specify inside the aggregation, taking all filters and selections into account. So when you have
GLOBAL ( COUNT_TABLE ( "CaseTable" ) )

Instead, when you do
GLOBAL ( COUNT ( DISTINCT "ActivityTable"."CaseId" ) )
the number might be different because not all cases are contained in the activity table.

Now it would be interesting what kinds of filters you are using. If the filter only contains the case table, GLOBAL should work. If you filter on things of the activity table, the join is calculated between the case and activity table and therefore, the case table rows that dont have a join partner are not part of the GLOBAL aggregation anymore. But this is desired, because when you filter on a specific category in the activity table for example, all cases without an activity also dont have this specific category, and therefore they shouldn’t be considered in the GLOBAL. If you filter on full cases, use the case table to filter on them rather than the activity table.

Best
David

After checking the exact query directly together with our Process Mining Consultants, we found the reason for the behavior, which I’d like to share here:

It’s basically because of what I assumed in my previous post. Let’s take a look at the following example:

Activities

Case Activity Timestamp
1 A 08.11 08:00
1 B 08.11 08:10
1 C 08.11 08:20
2 A 08.11 08:30

Cases

Case Category Delay
1 C1 100
2 C1 200
3 C2 600

In the Case-table, there are three cases of different categories with a delay number, but in the Activity-table, we only have events corresponding to two cases.

In an OLAP table, we now want to calculate for each category the sum of he corresponding delays and divide that by the total number of cases.

The query would be:

Dimension: "Cases"."Category"
KPI: SUM("Cases"."Delay") / GLOBAL ( COUNT_TABLE ("Cases") )

This works and returns the following result:

Category Delay share
C1 100
C2 200

Until now, everything is as expected. C1 has value 100 because (100+200)/3 = 100 and C2 has value 200 because 600/3 = 200.

Now let’s assume there is an active Analysis filter or selection which filters out an activity:

FILTER "Activities"."Activity" != 'A';

Our activity table then looks like this:

Case Activity Timestamp
1 A 08.11 08:00
1 B 08.11 08:10
1 C 08.11 08:20
2 A 08.11 08:30

As this is a column of a table which is not in the table we use in our actual query, a join is performed between the two tables.
So there is a left outer join between the Activities table (left / N side) and the Case table (right / 1 side), the Case table looks like this after the join with the filtered Activity table:

Case Category Delay
1 C1 100
2 C1 200
3 C2 600

In the example, only case 1 from the Activity table has a join partner in the Case table (case 2 of the original Activity table is filtered out completely, and case 3 was never present in the Activity table).

So our example query with the filter would produce this result:

Category Delay share
C1 100

The result is 100 because GLOBAL also respects the filtering, thus it returns 1 instead of 3 in the example.

This is a desired behavior, because if cases are filtered out, it should be respected inside the GLOBAL as well.

You can either use a static variable which does not take filters and selections into account(except Analysis filter), or you use GLOBAL which takes all filters into account. It is not possible to apply a filter to only specific parts of a query. It is possible to “filter” on the values that are aggregated by setting the values that shouldn’t be counted/summed up to NULL with a CASE WHEN statement and thereby not using a filter statement. This however doesnt reduce the number of rows in the result table.

We are happy to hear your suggestions on how to improve the situation. You say that filters should be taken into account in specific situations / in certain parts of the query, but they shouldn’t be taken into account in other situations / in other parts of the same query. Therefore it’s hard for me to generalize the problem, so I’m happy to hear wishes or feature proposals from your side.

Best
David

Interesting read, thanks for the detailed explanation of the functionality. I actually agree with the behavior you describe, @d.becher.

I think this design decision for your eventlog is the root issue. From my perspective, every case should have at least one activity in the eventlog - even if it’s just a “Create Case” activity and the Delay value for that activity is 0. Then that activity could be included as a dimension in your OLAP table and ALL cases would be counted without adding additional Delays to your calculation.

-Tyler

Thank you @d.becher for that detailled explanation.

Although I now fully understand that “GLOBAL” will take all Filters into account (which by the way kind of contradicts the name “global”), it might not always be useful - as seen in this case.
I escaped that problem by removing the filter, clustering all unwanted cases into one row and highlighting it visually. Not exactly the cleanest solution, but the normalization works now.

Looking at the topic from a higher level, one great feature would be to access KPIs in an “escaped” way.
Say you define a Saved Formula “Count Cases where attribute x = y”: It would be great to use that formula in a table, returning the same result as if I would use it in a single-KPI Component. That could be a case count or a ratio or anything taking the currently selected (!= filtered) dataset into account.
Being able to e.g. say "ESCAPE(KPI(“myKPI”)) would be a great addition to PQL, allowing you to use results that are usually already shown in Single-KPIs.
That would be pretty neat and a consistent way to “put things into perspective”.

@MN_Process_Mining: I think there might be a bit of a confusion here. “Events” in this case does not mean “Activities”. There’s actually no case having no entries in the activities table. The “event” table in my case is a separate table.

I marked @d.becher’s reply as solution. Thank you for your support.

Hi @maximilian.hoffmann,

sorry for the question again, but wouldn’t this be exactly what you could achieve with a static variable? A static variable will not take filters into account, GLOBAL will do it (Only one exception, the Analysis Filter is also respected in a variable).

Thanks for clarifying this
David

Hi Daniel. But Static variables do not take Selections into account, or do they?

No, they don’t. For every query, Selections and Analysis/Sheet/Component filters are handled the same from a PQL perspective, so distinguishing between both kinds of filters is not done anywhere so far. But the behavior is the same with selections, if you filter by clicking on an activity name in an olap table, the result will be the same as described in the example above.

David