Feature of the week series #2 - Variant Clustering


Dear Celonis Community!

Last week, we introduced the Feature of the week series with Analysis Shortcuts. This week we want to share the new Variant Clustering capability with you.

Variant Clustering
In the IBC and with the 4.5 release of Celonis 4, the new CLUSTER_VARIANTS function allows you to start your variant analysis by grouping similar variants. Here is a small example:

Let’s look at the Cable TV Order Process, which consists of the “Cable contract signed” activity and multiple activities related to payment.

In this example, we use the CLUSTER_VARIANTS function to distinguish groups of variants:

Only using the CLUSTER_VARIANTS function we were able to get insights of three groups of variants:

  • Standard cases (“Cable contract signed” -> … -> “1st Direct Debit withdrawal”)
  • Unfinished cases (only “Cable contract signed”)
  • Cases that required extra work (Direct Debit was denied)

To identify these groups, we used the formula CLUSTER_VARIANTS( VARIANT( activity_column ), [...] ). The inner part of the formula, VARIANT( activity_column ), captures for every case its chain of activities:

The outer part CLUSTER_VARIANTS( [...], density_variable, density_radius_variable ) internally projects these variants on a multi-dimensional space, based on how similar they are. It then groups variants that are close to each other together, and assigns numbers to the groups. (Variants that it can’t assign a group to are numbered with -1.) The density_variable is a threshold about how many variants need to be inside a certain area in the space, for them to be grouped. The size of the adjacent area is defined by the density_radius_variable.

In the analysis above, as next steps, we could try to tweak the used VARIANT() input further by remapping or concatinating values of activities. @Hans.van.der.Zandt, in this thread, you were already talking about clustering of variants. What do you think? We are happy to hear your and the community’s feedback on what challenges you face when trying to get insights from similar variants!

Your Celonis Product Team

Feature of the week series #6 - Factory Calendar
pinned #2


Dear M.Kohl,

Thanks for your input and proposal concerning variants.
Very much appreciated.
When I saw your analysis I was nit sure if we are haveing the same user story we like to “solve”.
Below the proposal of how a more visual graph of the distribution of the variants can look like according to two dimensions, When I would take your example , I would expect two clusters but the may vary depending on the different attribute (=dimensions). I personally would put on “dropdown” in these dimensions so you can see where you have the highest / lowest spread of variations. This also can be a nice tool to find the best scenario / variant for RPA…
What do you think?


Have fun!
Best Regards,

Feature of the week series #3 - Undo/Redo/Reset Selections

Hi Hans,

I saw you would like to use a dimension (a column from a non-activity table) as determining feature to group your variants. The CLUSTER_VARIANTS function only uses patterns in the eventlog, without looking at the dimensions. Did you also consider grouping the variants only by process features (e.g. order of activities)?

The visual example you sent, you can achieve with the KMEANS function. Here I created your example in Celonis, using KMEANS:


Would be interested to hear your thoughts on the first question :slight_smile:

Kind regards,


Dear Max

Thanks for your question. The answer depends on the Userstory you are looking at.
To get an overview of the main different processes (= variants) your solution is a great start and probably the starting point.
My user story tries the find out which attribute drives the variants… This would be the second step for me.

Have fun!