Dear Celonis Community!
Last week, we introduced the Feature of the week series with Analysis Shortcuts. This week we want to share the new Variant Clustering capability with you.
In the IBC and with the 4.5 release of Celonis 4, the new CLUSTER_VARIANTS function allows you to start your variant analysis by grouping similar variants. Here is a small example:
Let’s look at the Cable TV Order Process, which consists of the “Cable contract signed” activity and multiple activities related to payment.
In this example, we use the CLUSTER_VARIANTS function to distinguish groups of variants:
Only using the CLUSTER_VARIANTS function we were able to get insights of three groups of variants:
- Standard cases (“Cable contract signed” -> … -> “1st Direct Debit withdrawal”)
- Unfinished cases (only “Cable contract signed”)
- Cases that required extra work (Direct Debit was denied)
To identify these groups, we used the formula
CLUSTER_VARIANTS( VARIANT( activity_column ), [...] ). The inner part of the formula,
VARIANT( activity_column ), captures for every case its chain of activities:
The outer part
CLUSTER_VARIANTS( [...], density_variable, density_radius_variable ) internally projects these variants on a multi-dimensional space, based on how similar they are. It then groups variants that are close to each other together, and assigns numbers to the groups. (Variants that it can’t assign a group to are numbered with -1.) The
density_variable is a threshold about how many variants need to be inside a certain area in the space, for them to be grouped. The size of the adjacent area is defined by the
In the analysis above, as next steps, we could try to tweak the used
VARIANT() input further by remapping or concatinating values of activities. @Hans.van.der.Zandt, in this thread, you were already talking about clustering of variants. What do you think? We are happy to hear your and the community’s feedback on what challenges you face when trying to get insights from similar variants!
Your Celonis Product Team