Skip to main content

Hi guys!

 

I’m trying to apply a particular filter to a dataset with many activities. 

The scenario is as follows.

 

The raw event log contains activities done in a general hospital for over a year.

If I want to get the process for a specific activity, I would normally just filter ‘Include only cases that contain ‘Knee Replacement Surgery’. 

 

However, because there is so much going on in the hospital, such a filter is not enough. For example, if a person underwent a knee surgery is june 2021, the process map will also try to include other procedures that this person did in january 2021 or december 2021, even though these have nothing to do with the knee replacement surgery.

 

For that reason, I’m trying to narrow the filter down as follows:

 

“Provide an event log that only includes data of a maximum of 2 weeks in the past after the activity ‘Knee replacement surgery’ and a maximum of 2 weeks after this same activity.”

 

Note that this is NOT the same as applying a general time filter on the event log (for example ‘only include May, June and July).

 

The end result should still contain any occurrence of this activity in the full year of 2021. But if a patient has undergone the activity on June 15, 2022, only the ‘June’ activities for that patient will be included (2 weeks backwards and 2 weeks forward). If that some patient had undergone a medical visit in January or December, those will be discarded.

 

If another patient in that same dataset has undergone the knee replacement in October 1, only the last 2 weeks of September and the first 2 weeks of October should be included for this case.

 

The intention of this procedure is to limit the process maps as much as possible to only include the activities directly surrounding the knee procedure, and discard the many other medical activities the same patients might undergo that year. It is assumed that this will make our process map for specific hospital procedures a lot easier to oversee (less like a spaghetti process map).

 

So far, I have not been able to find a PQL solution for this question. I would therefore appreciate it very much if anyone can point me in the right direction. Thank you in advance!

 

 

Hmmm... what is your case id? Because if the caseID is the identifier of the patient (i.e. their social security number or national ID card)... then you will get all the procedures (related or unrelated) to that patient, if that patient had a "knee replacement surgery".

 

IF your case id were the "check in registration number" (i.e the code you get internally each time to go into a hospital for a procedure)... then you will ONLY get those cases were the patient had at least a knee replacement surgery (with to without other related procedures, like anaesthesiology.... ). And of course patientID will be a property of the case

 

Just my 0,02 €


Hmmm... what is your case id? Because if the caseID is the identifier of the patient (i.e. their social security number or national ID card)... then you will get all the procedures (related or unrelated) to that patient, if that patient had a "knee replacement surgery".

 

IF your case id were the "check in registration number" (i.e the code you get internally each time to go into a hospital for a procedure)... then you will ONLY get those cases were the patient had at least a knee replacement surgery (with to without other related procedures, like anaesthesiology.... ). And of course patientID will be a property of the case

 

Just my 0,02 €

Thank you very much for your reply Guillermo!

 

Indeed the registration number would be an option.

 

However, the registration number method will not work if an insurer wants to monitor patient pathways in and outside of the hospital. For example, the same patient undergoing a knee replacement surgery at the hospital, might later get a painkiller prescription at a pharmacy outside of the hospital, and also receive physical therapy for the knee outside of the hospital.

 

In such a case, the registration number of the hospital will not be useful as the case ID, because that is only relevant within the 'eco system' of the hospital. However, we still want to narrow our event log to only 2 weeks before and 2 weeks after this specific event. This would allow me to also capture any relevant activities in and outside the hospital within a specific time frame.

 

BTW I should clarify, I work with a claims data set of an insurer. The hospital data is part of this claims data set, but other types of providers that are not the hospital can also be included in the same dataset (because the insurer gets all the claim data from the insure, inside and outside of the hospital).

 

Thanks for your reply


Thank you very much for your reply Guillermo!

 

Indeed the registration number would be an option.

 

However, the registration number method will not work if an insurer wants to monitor patient pathways in and outside of the hospital. For example, the same patient undergoing a knee replacement surgery at the hospital, might later get a painkiller prescription at a pharmacy outside of the hospital, and also receive physical therapy for the knee outside of the hospital.

 

In such a case, the registration number of the hospital will not be useful as the case ID, because that is only relevant within the 'eco system' of the hospital. However, we still want to narrow our event log to only 2 weeks before and 2 weeks after this specific event. This would allow me to also capture any relevant activities in and outside the hospital within a specific time frame.

 

BTW I should clarify, I work with a claims data set of an insurer. The hospital data is part of this claims data set, but other types of providers that are not the hospital can also be included in the same dataset (because the insurer gets all the claim data from the insure, inside and outside of the hospital).

 

Thanks for your reply

I see, your use case is insurance related, not health-care. My bad

In that case yes, the caseid should be the patient.

 

 


Intuitively, I can't think of a solution in PQL.

 

However, this is the kind of issue we would solve within the data transformation as an SQL-Script.

Is this an option for you or are you dependant on beeing able to change the operation as well as the timespan before/after in a dynamic matter?


Hi there Patrick! Thanks for your reply. I'm very ok with solving this in SQL instead of PQL. If you have any suggestions as to how to approach this with SQL, please let me know!

 

Kind regards,

 

Dennis


I don't have the time to write it all out in SQL-Code right now but maybe the "pseudocode" already helps.

 

  1. Add the timestamp of each KneeOperation to the Activity-Table as additional columns to every row (you might need multiple columns for multiple operations; I guess no person has more then 5 knee-operations and or multiple operations within 2 weeks).
  2. Check if acitivtiy is within 2 weeks before or after any of the timestamps (DATEDIFF is the function you are looking for).

 

Hope it helps!

 

If you need additional support and If it is quite urgent/important, reach out via LinkedIn and I'll see, what I can do. https://www.linkedin.com/in/patrickgroth/

 

Best Patrick

 


I don't have the time to write it all out in SQL-Code right now but maybe the "pseudocode" already helps.

 

  1. Add the timestamp of each KneeOperation to the Activity-Table as additional columns to every row (you might need multiple columns for multiple operations; I guess no person has more then 5 knee-operations and or multiple operations within 2 weeks).
  2. Check if acitivtiy is within 2 weeks before or after any of the timestamps (DATEDIFF is the function you are looking for).

 

Hope it helps!

 

If you need additional support and If it is quite urgent/important, reach out via LinkedIn and I'll see, what I can do. https://www.linkedin.com/in/patrickgroth/

 

Best Patrick

 

Awesome Patrick! This pseudo code already gives me enough ideas on how to approach the challenge. Will give it a try and update in this thread. Thank you very much!


Reply