Moving average

Hey,

I am calculating a moving average of [value] (n = 10) across a table. To reduce computation, I am using an app formula on a physical column, so [value_moving_avg] is calculated once during record entry and then saved.

I have a workflow which triggers on any update or delete which changes the [value]. The workflow identifies the following 9 records, and updates updates their [value_moving_avg] so the dataset remains accurate.

My moving average formula is shown below:

AVERAGE(
  SELECT(
    table[value],
    IN(
      [id],
      TOP(ORDERBY(FILTER(table,[created_datetime]<[_THISROW].[created_datetime]),[created_datetime], TRUE),9)
    )
  )+LIST([value])
)

The identification of the rows to include in the moving average calculates rather quickly (the portion of the formula nested inside the IN()).

The challenge is, taking the referenced rows and turning them into [value] to be averaged. The IN() formula is very slow in this implementation. Now that my dataset has grown, the workflow rule hangs or fails entirely.

Any thoughts on how to optimize something like this?

@Aleksi please donโ€™t say its not possible at this time

Solved Solved
0 6 1,011
1 ACCEPTED SOLUTION

By the nature of the calculation, it will require multiple rows. One row for every row in the table, sort of as described here: https://www.essentialsql.com/sql-puzzle-calculate-moving-averages/

Maybe it will provide more clarity; here is what I am producing for my users:

The bigger problem is AppSheets operations for filtering, ordering, and selecting data from a list only returns a list of primary keys. There isnt a simple way to convert this list of primary keys back to a piece of data stored along with them. Its unfortunate that we canโ€™t specifically return a column when using operators like ORDERBY(). Rarely am I wanting a list of primary keys, i usually want some associated value.


Iโ€™ve found a solution to streamline the calculation, although its clunky in implementation it works.

The solution requires two physical columns: one column is EnumList base type REF, and holds a list of the 10 related records I would like to average. Essentially, this part of the bigger formula:

TOP(ORDERBY(FILTER(table,[created_datetime]<[_THISROW].[created_datetime]),[created_datetime], TRUE),9)

Then, I have a second formula which looks up each associated value directly, rather than using IN(). Essentially,

AVERAGE(LIST(
   [value],
   LOOKUP(INDEX([_THISROW].[EnumListREF],1),โ€tableโ€,โ€idโ€,โ€valueโ€),
   ...,
   LOOKUP(INDEX([_THISROW].[EnumListREF],9),โ€tableโ€,โ€idโ€,โ€valueโ€)
))

Doing the calculation this way has it performing an order of magnitude better. I still have a workflow rule to update associated records on updates or deletes.


I should also note, in case anyone uses this post in the future, that there are other options. Each has some benefits and downsidesโ€ฆ to name two:

  1. Perform the rolling average calculations on records as they are created, but rely on a time-scheduled report to re-validate the accuracy of rolling averages once a day. Users will not be impacted by the performance of the formula, but they may have to live with temporarily invalid data whenever they edit old records.
  2. Perform the rolling average calculation in the backend database. In excel or spreadsheet implementations, the formula becomes complex and slow unless you can guarantee the data is being saved in the proper order. Definitely possible in relational db; how well it performs Iโ€™m not sure. Maybe could be handled with a table view.

Cheers @Aleksi ! Would love to know if you have any other methods of doing this that may be better implented.

View solution in original post

6 REPLIES 6

In generally speakingโ€ฆ you want to calculate an average using top 10 values from the same table, correct?

That is the goal although old rows must be deletable or editable, and the dataset must adjust all related records dynamically.

A virtual column would do this of course but I have found it to be too slow.

How about if you make that virtual calculation in another table (only one row)?

By the nature of the calculation, it will require multiple rows. One row for every row in the table, sort of as described here: https://www.essentialsql.com/sql-puzzle-calculate-moving-averages/

Maybe it will provide more clarity; here is what I am producing for my users:

The bigger problem is AppSheets operations for filtering, ordering, and selecting data from a list only returns a list of primary keys. There isnt a simple way to convert this list of primary keys back to a piece of data stored along with them. Its unfortunate that we canโ€™t specifically return a column when using operators like ORDERBY(). Rarely am I wanting a list of primary keys, i usually want some associated value.


Iโ€™ve found a solution to streamline the calculation, although its clunky in implementation it works.

The solution requires two physical columns: one column is EnumList base type REF, and holds a list of the 10 related records I would like to average. Essentially, this part of the bigger formula:

TOP(ORDERBY(FILTER(table,[created_datetime]<[_THISROW].[created_datetime]),[created_datetime], TRUE),9)

Then, I have a second formula which looks up each associated value directly, rather than using IN(). Essentially,

AVERAGE(LIST(
   [value],
   LOOKUP(INDEX([_THISROW].[EnumListREF],1),โ€tableโ€,โ€idโ€,โ€valueโ€),
   ...,
   LOOKUP(INDEX([_THISROW].[EnumListREF],9),โ€tableโ€,โ€idโ€,โ€valueโ€)
))

Doing the calculation this way has it performing an order of magnitude better. I still have a workflow rule to update associated records on updates or deletes.


I should also note, in case anyone uses this post in the future, that there are other options. Each has some benefits and downsidesโ€ฆ to name two:

  1. Perform the rolling average calculations on records as they are created, but rely on a time-scheduled report to re-validate the accuracy of rolling averages once a day. Users will not be impacted by the performance of the formula, but they may have to live with temporarily invalid data whenever they edit old records.
  2. Perform the rolling average calculation in the backend database. In excel or spreadsheet implementations, the formula becomes complex and slow unless you can guarantee the data is being saved in the proper order. Definitely possible in relational db; how well it performs Iโ€™m not sure. Maybe could be handled with a table view.

Cheers @Aleksi ! Would love to know if you have any other methods of doing this that may be better implented.

How aboutโ€ฆ
AVERAGE(SELECT(Table[Value],
AND(
[_ROWNUMBER]<=[_THISROW].[_ROWNUMBER],
[_ROWNUMBER]>=[_THISROW].[_ROWNUMBER]-10
)))

That would be sufficient in the simplest case, however:

  1. I am using a SQL DB and a clustered index on columns other than datetime, so the order is not guaranteed to be sequential, by date.
  2. I have an additional filter which I excluded from my example formulas, so my select statement is actually an AND(). Meaning, the [_ROWNUMBER] range would have to be dynamic, and expand if the AND() select formula excluded records for some reason.
Top Labels in this Space