Pull up over multiple tables

We have a data structure with case, activity, customer and contact tables.
A contact is linked to a customer (N:1),
A customer is linked to an activity (1:N),
An activity is linked to a case (N:1).

A case can be executed for multiple customers, hence this relationship structure of customers linked to activities.

The challenge/question is to count the number of contacts for a customer of a specific case.
Bonus challenge: only select those contacts that fall within the case runtime (which are attributes in the case table).

The problem we’re facing is the famous ‘no common parent between tables could be found’ error, and we believe the even more famous PULL UP functions could help us, combined with a DOMAIN_TABLE possibly. However, the relationship is such that we cannot just access a parent or grandparent table, but need to go to the parent (case) of a sibling table (activity) of our child table (contact), through customer which is our direct parent.
In our example Customer is the parent table of both contact and activity, and we would like to count contacts for each case.

In SQL this would be a join between the four tables, grouped by case identifier, counting the number of contacts that fall within a date range.

Hey Joos,

This truly sounds like another use-case for our infamous Pull Up Functions :wink: .
I could imagine, that something like this works:

Dimension: Case_Table.Case

KPI:

SUM(PU_FIRST(Activity_Table, PU_COUNT(Customer_Table, Contact_Table.Contact)))

It could even be, that, depending on your dimensions etc this works as KPI:

SUM(PU_COUNT(Customer_Table, Contact_Table.Contact))

Best Regards,

Benedict

P.S.: I have a solution in mind for the bonus challenge if this works, but you need to specify on how the case runtime is connected to the activities’ eventtimes & how they relate to which contact needs to be chosen. Thus e.g. if the case runtime is shorter than the timeframe between the first activity and last activity in a case etc.

1 Like

Thanks a lot for your reply @Bene!

Your proposed SUM(PU_COUNT(…,…)) solution works, and shows us the total number of contacts for the customer of a case! :+1:

Regarding the bonus challenge, which now becomes crucial.
The cases table has a start and end datetime field. And the contact end time should lie after case start, and before case end.
I tried the ‘obvious’ by extending your PU_COUNT with these conditions, but it failed with the ‘no common parent’ error:
SUM(PU_COUNT("CUSTOMER", "CONTACT"."ID", ("CONTACT"."ENDDATETIME" >= "CASE"."CREATIONDATETIME" AND "CONTCT"."CONTACTENDDATETIME" <= "CASE"."ENDDATETIME")))

Using the DATE_BETWEEN function fails with an ‘invalid PQL expression’ error.

The dimension currently is case, which we might change, but for now let’s try with case.
If need be we can take the first/last activity time to select the relevant contacts.

Looking forward to your bonus answer :smiley:

Hey Joos!

So I had the fear you were going to say something like this and actually I don’t see a 100% solution for this problem. What could work is this:

SUM(
PU_COUNT(“CUSTOMER”, “CONTACT”.“ID”,
PU_FIRST(“CUSTOMER”, “CASE”.“CREATIONDATETIME” ) <= “CONTACT”.“ENDDATETIME”
AND “CONTACT”.“ENDDATETIME” <= PU_FIRST(“CUSTOMER”, “CASE”.“ENDDATETIME” ))

Best Regards,

Benedict

1 Like

Hi Bene,

EDIT: issue not solved, see following post(s).

And I was fearing your reply to state that it couldn’t be done :frowning:
However, it does seem to work! :partying_face: I did not use the case characteristics of start/end time but looked at the first/last activity and took that timestamp. I still need to thoroughly validate the results, but I get results, which are not always 0, etc.

Regardless of the outcome, I would already like to thank you very much for helping me/us get our head around this! If we ever meet, I owe you a :beers:!

Hmz, sorry, posted the good news too soon :frowning: (the :beers: offer still stands though).

The PU_FIRST statements take the first case creation datetime of all cases for the customer. While this should be the creation datetime of the ‘current’ case (i.e. row). But changing to case of course raises the no common parent error.

I also get 0 as a result for most cases…

I’ll give it some more thought over the weekend and during coming week.
Any hints/tips/ideas/brainstorms in the meantime are more than welcome!

Haha, I will follow up on that :beer:

Yeah, I was fearing something like this. If you think about, it absolutely makes sense that the wrong information is drawn. If you take the event time over a PU_FIRST / PU_LAST statement you take the first / last respective event time for one customer out of all events he took part in and then paste it in all their contacts, which is not the information you want to use. Unfortunately the same thing will happen with a double PU_FIRST in order to get the CASE level information.

This is why we at Celonis try to configure our Data Models so everything references to the CASE table. You Could try this as a final solution, by creating a 1:N (CASE:Customer) Relationship to the CASE table with each customer for each case

I would like to provide an update on how we solved this issue of joining tables with an N:M (or many-to-many) relationship as it might help others.

The situation was as follows:


As in, we have cases with process steps, process steps are executed for 1 client, and a client has 0 or more contacts.
This makes case the parent of proces step (as one case has multiple process steps or children, and process step refers to one parent case).
Celonis has issues when traversing a path where the arrows are reversing more than once. So we could do everything we wanted considering the tables case, processtep, client, processType and procesCluster. Issues started when Contact was added, as is indicated by the red dotted arrow.

Eventually we solved it by adding an additionally generated view ‘Case_Contact’. In this view contacts are linked to cases, such that each row in case_contact refers to exactly one case if the contact was within 1 year before case start and one year after case end.
The downside is that case_contact now contains duplicated contacts, for example for cases that ran almost at the same time.

Using the case_contact view we can determine cases which had more than x contacts within y months before and z months after case runtime (as long as y and z are max 1 year). We can then get those clients that had at least one of these cases, and thus an issue.

For us this is a workable workaround, but we will encounter more similar issues, and the current workaround has its limitations. I would therefore like to gather fellow customers that have this use case and issue, such that we can convince Celonis to provide us with more advanced table joining features :slight_smile:
For now, I hope this helps fellow customers in working around the issue.
Any further insights are appreciated!

1 Like