How to count how often a specific sequence of activities occurs


#1

Hello there,

I´m looking for support to find a solution on how to count how often a specific sequence occurs.

Celonis version: 4.2.

Goal:
Count how often the following sequence of activities “A-B-C” occurs.
A direct following B direct following C

Comments:
The same sequence can happen multiple times in the same CASE (let´s say within the same production order).

Each ACTIVITY (A or B or C) can happen multiple times, but the sequence is valid only if the same order is obtained (e.g: AAA-BB-C, AA-B-C, or A-BBBB-CC are valid sequences).

If there are multiple sequences within the same CASE, then we can have other activities in between.
E.g, AAA-BB-C-F-G-A-F-A-BBBB-CC … here the sw should return the value 2.

Formula used for single KPI calculation:

Option 1
COUNT(CASE WHEN **
** PROCESS EQUALS

‘A’ TO ‘B’ TO ‘C’
THEN 1.0
ELSE 0.0
END
** )
*

*Option 2
SUM(
** MATCH_PROCESS (

LOOP [‘A’] AS loop_A,
LOOP[‘B’] AS loop_B,
LOOP[‘C’] AS loop_C
CONNECTED BY DIRECT [loop_A, loop_B], DIRECT [loop_B, loop_C])
)
+
SUM(
** MATCH_PROCESS (

LOOP [‘A’] AS loop_A,
LOOP[‘B’] AS loop_B,
LOOP[‘C’] AS loop_C,
LOOP[‘A’,‘B’,‘C’] AS loop_ABC
CONNECTED BY EVENTUALLY [loop_ABC, loop_A], DIRECT [loop_A, loop_B], DIRECT [loop_B, loop_C])
)

Expected results: A-B-C occurs 7 times.

Celonis Results: see table. there is a significant gap.

Inputs.docx (13.8 KB)


#5

Hi,

you can use the function VARIANT() to display all activities per case:

Based on this, you can then add KPI like case count, with which you can count the frequency of a particular activity sequence.
For more information on the Variant() function you can go to the PQL library in the manual, which you can find at the HELP section located at your user name dropdown.

Best regards,
Viana


#7

Thanks for the feedback.

However, it doesn´t seem to be a solution for my case.
As per PQL manual,“Variant aggregates all activities of a case into a string, which represents a variant of the process”.
In my case I need to look at specific sub-sequences within the same CASE.
Example: how often the sequence A-B-C occurs.
Case1 --> Variant (each letter is an activity): G;H;T;A;A;A;A;B;B;B;C;C;C;K;T;A;A;B;B;D;H;A;A;B;C;J;S;D.
Expected amount: 2
Celonis calclulation: 1

Let me attach a file containing the last developped formula.
Appreciated if Celonis Team could support me on overcoming this challenge which is key in multiple production KPIs.

190128_Request to Celonis Support.docx (20.8 KB)

With kindest regards,
Francesca


#8

Hi Francesca,

You wrote: “This formula writes “1” every time the sequence occurs in each “C” line (OLAP Table – see below). I would like to see “1” only in the last “C” of my sequence (see 0 in the column “Expected”).”

Would it be a problem if the 1 is written next to the first “C” of the sequence rather than the last?

If I understand you correctly then the sequence your interested in starts with a single ‘A’ to as many ‘B’ as you like and then ending with single ‘C’ i.e ‘A’,’B’,’B’,’B’,’B’,’C’. If there are extra ‘A’s at the front or ‘C’s at the back (i.e. ‘A’, ‘A’,’B’,’B’,’B’,’B’,’C’ ,’C’,’C’ ) then that has no effect on the fact one sequence has taken place.

If you change your process definition to

CASE WHEN     
         "_CEL_PP_ACTIVITIES"."ACTIVITY_EN" NOT IN (‘D’,’F’,‘G’,’H’,’T’) 
         AND
         MATCH_PROCESS (
         STARTING ['A'] AS starting_A,
         LOOP['B'] AS loop_B,
         ENDING ['C'] AS ending_C
         CONNECTED BY DIRECT [starting_A, loop_B], DIRECT [loop_B, ending_C])>0
         
THEN (CASE WHEN "_CEL_PP_ACTIVITIES"."ACTIVITY_EN" = 'C'
             THEN 1.0 ELSE NULL END)
ELSE NULL 
END

Then it should only add a 1 next to the ‘C’ used in the sequence (the first one) so that the total sequences counted should match your original expectation, even if the 1s are in a slightly different position.

I hope this helps.

Best Regards,
Viana


#10

thank you again for your kind support.

However, I´m afraid that this does not solve my problem.
In the Celonis PQL guidelines we can read the following:

  • NODE : Node which has to be part once in the case, without any restrictions on where the node has to be.
  • STARTING : Node which has to happen at the beginning of a case.
  • ENDING : Node which has to happen at the end of a case.
  • LOOP : Node which occurs at least once but can also be repeated.

If I use the start or end in the syntax, then I give the command to select only the CASE starting and ending with a specific activity.
Not to mention that the MATCH_PROCESS function does not allow to count how often a specific sequence occurs. It just select the cases where the condition occurs at least once.

Moreover, it is important to identify the intere sequence and having the 1 in the last ‘C’, because then it is possible to go deeper in the analysis by calculating, for instance, the amount of time spent (and its equivalent in money).

I think there is yet some effort to do in finding the right formula.

Thanks again and I look forward to receiving further support on this challenging topic!
With kindest regards,
Francesca