Snap User Story: Formula 1 Analysis

I like motorsports, I like car racing, I like to go to race tracks or watch races on TV. So I thought "How cool would it be to analyze Formula 1 seasons, races, and drivers like polish Grand Prix legend Robert Kubica?” Although I’m a Celonis employee, I of course couldn’t use the software for private purposes like this. However, that changed with the release of Celonis Snap, which is free to use for everybody. Here’s how I created this Analysis using my personal Snap Account.

First of all, I needed to get data. I found a very nice dataset on https://ergast.com/mrd/db/ . The dataset consists of multiple tables containing up-to-date F1 race data from the very first Grand Prix in 1950 until today. I created the tables in my Postgres database based on the schema information provided on the website, and dumped the csv files into it.

I dumped them into a relational database instead of uploading them directly to Snap because I needed to do some transformations before, for example, I wanted to create an eventlog, which is easiest in SQL. For this, I first had to decide: What is a Case? What is my Case Key? Which Activities do I have? After exploring the dataset a bit I decided to use the results-table as my case table. In this table, every row represents a result of a specific driver in a specific race. The case key consists of the race ID, the driver ID, and the car number. The latter one is required because in the early days, some drivers started the race in one car and changed the car mid-race, meaning that they have two results-entries for a specific race, which alone would not be a unique case identifier.

The Activities should represent how the corresponding race weekend looked like: The current F1 race weekend format contains practice sessions (which are not in the data, so I didn’t care about them), a qualifying consisting of 3 sessions, where a driver advances to the next session if he’s fast enough, as well as the race itself. So by looking at the qualifying times I determined in which qualifying sessions the driver took part in, so later in the Process Explorer, I would be able to see if a driver usually makes it to the third round of qualifying, or if he usually doesn’t advance, or if there was a race weekend where he didn’t post a time in qualifying at all. Based on that I created the eventlog as a view in SQL (postgres syntax) :

CREATE VIEW _cel_f1_activities AS
 SELECT activities._case_id,
    activities.activity,
    activities.eventtime,
    activities.sorting
   FROM ( SELECT concat(r.driverid, r.number, r.raceid) AS _case_id,
            'Qualifying 1'::text AS activity,
            (( SELECT to_timestamp(races.date || ''::text, 'YYYY-MM-DD FMHH24:FMMI:FMSS'::text)::timestamp without time zone AS to_timestamp
                   FROM races
                  WHERE races.raceid = r.raceid)) - '1 day'::interval AS eventtime,
            1 AS sorting
           FROM results r
          WHERE (EXISTS ( SELECT *
                   FROM qualifying q
                  WHERE r.raceid = q.raceid AND r.driverid = q.driverid AND q.q1 IS NOT NULL AND q.q1::text <> '\N'::text))
        UNION ALL
         SELECT concat(r.driverid, r.number, r.raceid) AS _case_id,
            'Qualifying 2'::text AS activity,
            (( SELECT to_timestamp(races.date || ''::text, 'YYYY-MM-DD FMHH24:FMMI:FMSS'::text)::timestamp without time zone AS to_timestamp
                   FROM races
                  WHERE races.raceid = r.raceid)) - '1 day'::interval AS eventtime,
            2 AS sorting
           FROM results r
          WHERE (EXISTS ( SELECT *
                   FROM qualifying q
                  WHERE r.raceid = q.raceid AND r.driverid = q.driverid AND q.q2 IS NOT NULL AND q.q2::text <> '\N'::text))
        UNION ALL
         SELECT concat(r.driverid, r.number, r.raceid) AS _case_id,
            'Qualifying 3'::text AS activity,
            (( SELECT to_timestamp(races.date || ''::text, 'YYYY-MM-DD FMHH24:FMMI:FMSS'::text)::timestamp without time zone AS to_timestamp
                   FROM races
                  WHERE races.raceid = r.raceid)) - '1 day'::interval AS eventtime,
            3 AS sorting
           FROM results r
          WHERE (EXISTS ( SELECT *
                   FROM qualifying q
                  WHERE r.raceid = q.raceid AND r.driverid = q.driverid AND q.q3 IS NOT NULL AND q.q3::text <> '\N'::text))
        UNION ALL
         SELECT concat(r.driverid, r.number, r.raceid) AS _case_id,
            'Race'::text AS activity,
            ( SELECT to_timestamp(
                        CASE
                            WHEN races."time" IS NOT NULL THEN (races.date || ' '::text) || races."time"
                            ELSE races.date || ''::text
                        END, 'YYYY-MM-DD FMHH24:FMMI:FMSS'::text)::timestamp without time zone AS to_timestamp
                   FROM races
                  WHERE races.raceid = r.raceid) AS eventtime,
            4 AS sorting
           FROM results r) activities
  ORDER BY activities._case_id, activities.eventtime, activities.sorting;

I created a couple of other views, for example views that add the case key column to some tables (including the Case table results itself), and for some time(stamp) conversions.

I exported all views (and all tables that I didn’t change, i.e. where I didn’t add a view) by executing SELECT * FROM table_or_view and storing the result in a csv file.

In Celonis Snap, I created a new Data Pool in Event Collection, and uploaded all the files I created. I created a new Data Model, added the uploaded tables to it, and set alias names for all tables. Then I connected all tables based on their foreign key relationship. This is how my Data Model finally looks like:

After a successful Data Model load, it was time to create a new Workspace and a new Analysis inside it. When building the Analysis, I first needed to explore all the columns and information the dataset provides in order to come up with some ideas about what to show in the Analysis. However, I didn’t have to care about joins anymore, which made it very easy to quickly build some nice charts and display some interesting Kubica Performance Indicators (KPIs) already.

As the Process Analytics functionality is not restricted in any way in Celonis Snap, I could use all the features of the IBC enterprise version here!

So if you want to analyze your favorite sports, sign up for Celonis Snap, find a nice dataset, setup a database, use SQL to generate your eventlog, and then upload everything to Celonis Snap. As soon as your Data Model is ready, you can add many nice charts to your analysis. You can also invite friends and colleagues to collaborate, and you can share a public viewer link with the world.

Now, there’s only one thing left to say: Keep racing!

4 Likes