Ah! You almost nailed it with the new RANGE_APPEND function!

Dear Celonis team.
First, let me congratulate you for continuously improving this wonderful tool.
RANGE_APPEND was a very much anticipated and need function for me. However It misses one small detail that I was expecting: Gap Filling and Interpolation (GFI), like in the Vertica TIMESERIES Clause (https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AnalyzingData/TimeSeries/GapFillingAndInterpolationGFI.htm)
The Problem
When showing a series of date related records, lets say the daily account balance of my banking customers, I need to fill not just a date value for each day, but also to assume the same account balance of the previous day (non null record), if the balance hasnt changed on a specific date. RANGE_APPEND does part of it. It fills the gaps in the days that dont exist, but assumes 0 or null on the KPI values for the columns in my data table. This is OK, but not enough, regretfully.
I even tried to use the LAG function, hoping it would match each generated date value of the RANGE_APPEND, with previous non null value of my table. But it didnt.
The Current solution
To solve my problem, which is trivial in time series analysis, I currently have a table, built in the Event Collection, using Verticas TIMESERIES with Constant Interpolation, that has one record per day, per customer, per credit account. As you can imagine, this already has 500.000.000 records just for one year of data. Its becoming unmanageable and taking most of the contracted storage in our IBC licence, that I will need for my other data models.
Please, please, consider enhancing the RANGE_APPEND with a new parameter that would allow us to chose the type of behavior for the values in the other columns of the table: assume 0 or null (current feature), assume last valid value (constant interpolation) or even linear interpolation.
Thank you.
Best regards,
Joaquim

Page 1 / 1

Hi Joaquim,
thanks for your mostly positive feedback, we really appreciate this!
We didnt know this kind of use case when developing the function. Still, extending RANGE_APPEND as you suggested would create new challenges, because this would require to join generated values with a different existing value. This could lead to weird behavior when filtering, for example.
However, I think we still can solve the constant interpolation. The idea would be to use RUNNING_TOTAL. With this, the previous value would be kept for the gaps. So inside the RUNNING_TOTAL, we need to return the difference to the previous value rather than the actual value. The difference to the previous value can be retrieved using LAG.
For a simple case count, this would be the query I came up with:

RUNNING_TOTAL(

CASE WHEN COUNT_TABLE("Cases") = 0 THEN 0 ELSE

COUNT_TABLE("Cases")-COALESCE(LAG(CASE WHEN COUNT_TABLE("Cases") = 0 THEN NULL ELSE COUNT_TABLE("Cases")  END),0) END

)

Lets take the third chart with the ticket priorities in the example that Ive linked in this post:

How to create a date range Codes & Statements

Hi everyone,
Im happy to announce that this feature is now available in the IBC.
We added the new RANGE_APPEND function, which takes a column and appends all values inside a given range with a given step size to that column. Just take a look at the corresponding page in the IBC documentation here: help.celonis.cloud/help/display/CIBC/RANGE_APPEND
Or check out the following example:
https://pql.eu-1.celonis.cloud/process-mining/public/105139f8-2fa6-4ef2-b2e0-c33299e3f7e7/#/frontend/documents

On the right side of the following screenshot, you can see the result of the query above:

image1310269 13 KB

Let me know what you think; and thanks again for your input!
Cheers
David

Hi, David.
Thank you very much for your reply. This was definitely one more step to solve the whole challenge

The only thing missing is the assumption of a previous value that is not an aggregation, i.e., the actual value (number, string, date) that lies in the LAG record.
I tried, but couldnt do it. The fictional date created with the RANGE_APPEND always returns a null value for the column Im trying to collect.
Cheers,
Joaquim

Hi Joaquim,
so you want to fill gaps without using an aggregation function? So the use case is to keep all values from the original column (not aggregated, so can contain duplicates) and fill the gaps in there?
Example:
Source Table:

date
value

01.01.2020
5

01.01.2020
7

02.01.2020
3

04.01.2020
8

What you expect:

date
value

01.01.2020
5

01.01.2020
7

02.01.2020
3

03.01.2020
3

04.01.2020
8

Is that what you want?
Cheers
David

Hi, David.
Thats exactly it

This would solve my problem of having to pre-build such a table, as i am doing now.
Heres your example, applied to our Credit Recovery Process, that will give you a perspective on how much we could save in record numbers and storage:
Original Table

date_start
date_end
Total Credit
Default Credit
Department

25.03.2020
26.03.2020
50.000
0
Branch

27.03.2020
01.04.2020
50.000
100
Credit Recovery Dept.

.
Generated Table (Daily Position) - Currently created in the transformation SQL

date
Total Credit
Default Credit
Department

25.03.2020
50.000
0
Branch

26.03.2020
50.000
0
Branch

27.03.2020
50.000
100
Credit Recovery Dept.

28.03.2020
50.000
100
Credit Recovery Dept.

29.03.2020
50.000
100
Credit Recovery Dept.

30.03.2020
50.000
100
Credit Recovery Dept.

31.03.2020
50.000
100
Credit Recovery Dept.

01.04.2020
50.000
100
Credit Recovery Dept.

And this is actually generated for each credit product of each customer (620 million records

) = 220GB. The original table has 68 million records = 18GB.
Since this data is used during the analysis of the process, we would save a lot in storage and processing.
Cheers,
Joaquim

Hi Joaquim,
thanks for your input! We did not know about this use case until now, and always assumed that such a range filling function is usually used in combination with an aggregation function.
I think that your problem indeed cannot be solved with RANGE_APPEND.
I will note your challenge as a feature request, and get back to you if we have additional questions.
Cheers
David

Hi, David.
Thank you for your feedback.
Ill be following this.
If your development team could achieve this feature, it would be great.
Again, take your inspiration from the Vertica documentation on the Constant Interpolation. The challenge would be to do the same in PQL.
Best regards,
Joaquim

Reply

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded