Process Mining Event Log basics

In this article, I would like to give you some practical examples in order to understand Event Logs and how to identify them. To begin with, these are the most frequently asked questions:

  1. Why do we need an Event Log? To be able to apply process mining techniques it is essential to have an Event Log.
  2. What is an Event Log? The simplest example would be a CSV file that contains at least these three columns Case ID, Activity and Timestamp. Hint : In order to never forget these columns remember CAT .
  3. How can I identify these columns in a CSV file? The easiest way to evaluate your CSV file is to sort your data, first, according to the column you think is your Case ID and then Timestamp like it is shown in the screenshot below.

And the result is the following

And this CSV file is a nice Event Log because it follows the Event Log principles:

  1. An Event Log can be seen as a collection of cases. For instance in the screenshot above our collection would be FN338, GH2230, LH2286 and LH2306.
  2. A case can be seen as a clear sequence of activities. For example, for case FN338 we have a clear sequence of activities: Check-in, Boarding, Takeoff, Landing and Baggage claim.

Let’s see now some bad examples that don’t follow these principles.

Example 1: One Activity per Case.

For instance, we have the following event log. Shown in the screenshot below.

Let’s apply the same trick. First, we sort according to the column we think is our Case ID and then Timestamp. Like it is shown below.

The result after sorting will be the following table.

What we can immediately see here is that we are violating the second principle of an Event Log

A case can be seen as a clear sequence of activities.

because we don’t have a sequence of activities since we have only one activity per case.

Example 2: The same timestamp for all activities.

We have the Event Log shown in the screenshot below.

After we apply the same trick (sort according to the column we think is our Case ID and then Timestamp) we have the result shown below.

What we can immediately see here is that we are violating the second principle of an Event Log again

A case can be seen as a clear sequence of activities.

because we don’t have a clear sequence of activities since all the activities happen at the same time.

Example number 3: Hundreds of activities for one case.

This is also wrong because it violates the first principle

An Event Log can be seen as a collection of cases.

because in this example, we have only one case.

Let me know in the comments if I forgot something.