Ah! You almost nailed it with the new RANGE_APPEND function!

Dear Celonis team.

First, let me congratulate you for continuously improving this wonderful tool.

RANGE_APPEND was a very much anticipated and need function for me. However… It misses one “small” detail that I was expecting: Gap Filling and Interpolation (GFI), like in the Vertica TIMESERIES Clause (https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AnalyzingData/TimeSeries/GapFillingAndInterpolationGFI.htm)

The Problem
When showing a series of date related records, let’s say the daily account balance of my banking customers, I need to fill not just a “date value” for each day, but also to assume the same account balance of the previous day (non null record), if the balance hasn’t changed on a specific date. RANGE_APPEND does part of it. It fills the gaps in the days that don’t exist, but assumes “0” or null on the KPI values for the columns in my data table. This is OK, but not enough, regretfully.

I even tried to use the LAG function, hoping it would match each generated date value of the RANGE_APPEND, with previous non null value of my table. But it didn’t.

The Current solution
To solve my problem, which is trivial in time series analysis, I currently have a table, built in the Event Collection, using Vertica’s TIMESERIES with “Constant Interpolation”, that has one record per day, per customer, per credit account. As you can imagine, this already has 500.000.000 records just for one year of data. It’s becoming unmanageable and taking most of the contracted storage in our IBC licence, that I will need for my other data models.

Please, please, consider enhancing the RANGE_APPEND with a new parameter that would allow us to chose the type of behavior for the values in the other columns of the table: assume 0 or null (current feature), assume last valid value (constant interpolation) or even linear interpolation.

Thank you.

Best regards,

Joaquim

Hi Joaquim,

thanks for your mostly positive feedback, we really appreciate this!

We didn’t know this kind of use case when developing the function. Still, extending RANGE_APPEND as you suggested would create new challenges, because this would require to join generated values with a different existing value. This could lead to weird behavior when filtering, for example.

However, I think we still can solve the constant interpolation. The idea would be to use RUNNING_TOTAL. With this, the previous value would be kept for the “gaps”. So inside the RUNNING_TOTAL, we need to return the difference to the previous value rather than the actual value. The difference to the previous value can be retrieved using LAG.
For a simple case count, this would be the query I came up with:

RUNNING_TOTAL(
CASE WHEN COUNT_TABLE("Cases") = 0 THEN 0 ELSE
COUNT_TABLE("Cases")-COALESCE(LAG(CASE WHEN COUNT_TABLE("Cases") = 0 THEN NULL ELSE COUNT_TABLE("Cases")  END),0) END
)

Let’s take the third chart with the ticket priorities in the example that I’ve linked in this post:

On the right side of the following screenshot, you can see the result of the query above:

Let me know what you think; and thanks again for your input!

Cheers
David

Hi, David.

Thank you very much for your reply. This was definitely one more step to solve the whole challenge :slight_smile:

The only thing missing is the assumption of a previous value that is not an aggregation, i.e., the actual value (number, string, date…) that lies in the LAG record.

I tried, but couldn’t do it. The “fictional” date created with the RANGE_APPEND always returns a null value for the column I’m trying to collect.

Cheers,

Joaquim

Hi Joaquim,

so you want to fill gaps without using an aggregation function? So the use case is to keep all values from the original column (not aggregated, so can contain duplicates) and fill the gaps in there?

Example:

Source Table:

date value
01.01.2020 5
01.01.2020 7
02.01.2020 3
04.01.2020 8

What you expect:

date value
01.01.2020 5
01.01.2020 7
02.01.2020 3
03.01.2020 3
04.01.2020 8

Is that what you want?

Cheers
David

Hi, David.

That’s exactly it :slight_smile:
This would solve my problem of having to pre-build such a table, as i am doing now.
Here’s your example, applied to our Credit Recovery Process, that will give you a perspective on how much we could save in record numbers and storage:

Original Table

date_start date_end Total Credit Default Credit Department
25.03.2020 26.03.2020 50.000€ 0€ Branch
27.03.2020 01.04.2020 50.000€ 100€ Credit Recovery Dept.

.
Generated Table (Daily Position) - Currently created in the transformation SQL

date Total Credit Default Credit Department
25.03.2020 50.000€ 0€ Branch
26.03.2020 50.000€ 0€ Branch
27.03.2020 50.000€ 100€ Credit Recovery Dept.
28.03.2020 50.000€ 100€ Credit Recovery Dept.
29.03.2020 50.000€ 100€ Credit Recovery Dept.
30.03.2020 50.000€ 100€ Credit Recovery Dept.
31.03.2020 50.000€ 100€ Credit Recovery Dept.
01.04.2020 50.000€ 100€ Credit Recovery Dept.

And this is actually generated for each credit product of each customer (620 million records :astonished:) = 220GB. The original table has 68 million records = 18GB.

Since this data is used during the analysis of the process, we would save a lot in storage and processing.

Cheers,
Joaquim

Hi Joaquim,

thanks for your input! We did not know about this use case until now, and always assumed that such a range filling function is usually used in combination with an aggregation function.
I think that your problem indeed cannot be solved with RANGE_APPEND.
I will note your challenge as a feature request, and get back to you if we have additional questions.

Cheers
David

Hi, David.

Thank you for your feedback.

I’ll be following this.

If your development team could achieve this feature, it would be great.

Again, take your inspiration from the Vertica documentation on the Constant Interpolation. The challenge would be to do the same in PQL.

Best regards,

Joaquim